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A  FUNCTIONAL  EQUATION  IN  THE  THEORY  OP  DYNAMIC  PFOORAMMINO 

AND  ITS  GENERALIZATIONS 

Richard  Bellman  and  Sherman  Lehman 
fll .  Introduction. 

We  propose  In  this  paper  to  study  a  particular  functional 
equation 

f(x,y)  «  Max  Cpi(rix  f  ( (l-ri  )x  ,y) ) ,  p#(r*y  +  f  (x,  ( l-re  )y) )  J  , 

x»y  ^0,  (1.1) 

together  with  some  of  Its  generalizations  and  extensions. 

The  equation  arises,  as  we  shall  show  In  the  following  section, 
In  the  following  way:  Let  us  assume  thi^t  we  possess  two  gold  nines, 

4 

Anaconda,  which  possesses  an  amount  of  gold  In  quantity  x,  and 
Bonanza,  which  possesses  an  amount  y,  together  with  one  gold-mlnlng 
Mchlne.  If  the  machine  is  used  In  the  Anaconda  mine,  there  Is  a 
probability  pi  that  rix  of  the  gold  will  be  nlned  without  damaging 
the  machine,  which  means  that  the  operation  can  be  continued,  and 
a  probability  (l-Pi)  that  the  machine  will  be  damaged  beyond  repair 
and  mine  no  gold.  Similarly,  the  Bonanza  mine  has  associated  the 
probabilities  qi  and  (l-qi)  and  the  quantity  r*y.  The  problem  Is 
to  determine  the  course  of  action  which  will  maximize  the  expected 
amount  of  gold  mined  before  the  machine  Is  damaged. 

If  we  allow  for  a  greater  variety  of  outcomes,  we  obtain  an 
extension  of  (l.l),  namely. 


(a)  x,y  >  0 

(b)  0  i  1  1  '  (1.5) 

(o)  ^  0,  O^(oo)  <  1,  1-1, 2, 

The  general  problen  will  be  one  inwftlvlng  a  nunber  of  differ¬ 
ent  ninee  together  with  a  number  of  maohlnee  of  different  perfonaanoe. 
There  ie  no  dlffleolty  in  deriving  similar,  but  more  complicated 
equationa^ 


-3- 


rJ 


f(p)  -  Max  ’-)Qg(p,q,r)  f  (T(p,q,r) )  ^  dO  (W 

q  L*'  ^ 


(1.6) 


where  p  la  a  point,  (xi ,xa , * • ,  and  T(p,q,r)  la  a  tranafor»ed 


point. 


We  ahall  begin  our  dlaouaalon  by  eatabllahlng  an  exlatenoe  and 
unlqueneaa  theorem  which,  while  not  nearly  the  mofet  general  which 
may  be  obtained,  llluatratea  very  clearly  the  Mthoda  that  may  be 

■> 

J  used.  The  baalc  'sethod  la,  of  courae,  that  of  aucceaalve  approxl— 


matlona.  We  alao  dlscuaa  the  dependence  of  fCp)  upon  paranetera 
appearing  In  g  and  T. 

Aa  might  be  expected  from  the  nonlinear  nature  of  the  funotlonal 
equatlona,  the  aolutlon  of  theae  equationa  la,  In  general,  quite 
difficult  to  obtain  or  deacrlbe.  Op  to  the  preacnt,  only  a  handful 
have  been  completely  received.  In  what  followa,  we  ahall  consider 


(l.l),  (1.2),  and  some  Immediate  generallzatlona.  The  oaae  where 
only  a  finite  number  of  operatlona  are  permitted  will  also  be 
treated. 

Turning  from  the  problem  of  maximising  the  expected  rttum« 
we  ahall  consider  the  more  general  question  of  maximizing  the 
expected  value  of  some  function  of  the  return.  The  almpleat  ana— 


J 


1 

1 


logue  of  (1.1)  la  then 

Ptf(x,  (l-r»  )y,t  +  rty)  +  (l”Pa)Kt)n 

Under  certain  assumptions  concerning  ^(t),  this  equation  can 
be  solved,  possessing  a  solution  similar  to  that  of  (l.l).  A  par¬ 
ticularly  Important  ease  Is  that  where  ^(t)  -  b  >  0.  The 


1 

( 


aiyaptotio  font  of  f(x,y.<.t)  as  x,j  — >  oo  can  than  be  obtained. 

It  can  bo  shown  by  moans  of  counter-examples,  cf .  [8j  ,  that 
the  diffloultios  encountered  in  the  discrete  formulation  generating 
the  preceding  equations  are  due  to  the  intricate  form  of  the  solu¬ 
tion  and  that  simple  solutions,  possessing  an  intuitive  origin,  are 
not  to  be  obtained  in  all  cases. 

To  ovo rc one ' some  of  these  difficulties  sufficiently  to  obtain 
some  approximate  knowledge  concerning  the  solutions,  we  have  Intro¬ 
duced  continuous  versions  of  the  problems.  The8*<  lead  to  problems 
in  the  calculus  of  varla  lont  which  are  fortunately  sufficiently 
nonlinear  to  be  susoeptlble  to  a  variational  attack.  The  problems 
are,  however,  not  completely  straightforward  and  require  a  noh- 
olassical  type  of  argumentation. 

Guided  by  our  previous  results,  we  consider  In  turn  the  two- 
ohoice,  the  three-choice,  nonlinear  utility  and  two-choice,  finite¬ 
time  problems,  obtaining  complete  solutions. 

We  have  treated  only  particular,  simple  cases  of  the  eeatinu— 
ous  versions  in  order  not  to  enmesh  ourselves  in  conceptual  diffi¬ 
culties.  The  general  formulation  requires  a  separate  treatment 
which  will  be  given  elsewhere. 

The  central  problem  we  have  discussed  In  this  paper  Is  a 
particular  maximization  problem  connected  with  multi-stage  processes 
of  deterministic  and  stochastic  tyi)e.  The  general  theory  of  these 
processes  constitutes  the  theoi*y  of  dynamic  programming  which  has 
been  discussed  in  a  number  of  recent  papers,  [ij  -  [^7]  . 
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62 .  Mathematical  Formulation. 

To  derive  (l.l),  let  us  set 

r(x,y)  -  expected  amount  of  gold  mined  before  the  machine 

Is  damaged  when  A  has  an  amount  x,  B  has  an  amount  v, 
and  an  optimal  policy  Is  pursued.  (2.1) 

If  we  choose  to  mine  Anaconda,  an  operation  we  shall  denote 
by  A,  with  probability  pi  we  obtain  riX  and  the  privilege  of  con¬ 
tinuing;  while  with  probability  (l-pi)  we  obtain  nothing.  Since 
an  optimal  policy  must  have  ai':  optimal  continuation,  the  expected 
return  from  an  A— choice  will  be 

-  Pi(rix  +  f ( ( l-ri )x,y) ) .  (2.2) 

Similarly,  the  expected  return  from  a  B— choice  Is 

-  P2 (rzy  +  r(x , ;i -r2 )y) ) .  (2.3) 

Since  our  furpose  Is  to  maximize  the  expected  return,  we  have 

f(x,y)  -  Max  (E^,E0)  ,  (2.4) 

which  Is  precisely  (l.l). 

We  can  Increase  the  possibilities  without  increasing  the  com¬ 
plexity  of  the  equation.  L^t  us  assume  that  an  A— choice  has  the 
following  probabilities  associated: 
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(a)  Pi  ■  probability  of  obtaining  TjX  and  continuing 

(b)  P2  •  probability  of  obtaining  0  and  continuing 

(c)  P3  -  probability  of  obtaining  x  and  conMnulng  (2-5) 

(d)  P4  -  probability  of  obtaining  0  and  ermlnal  Ing 

the  process 


In  a  like  manner,  let  B  have  the  probabilities  qi,q2,q3,q4 
attached  to  Its  choice.  Then  we  obtain 


f(x,y)  -  Max 


A:  piQriX  +  f((l-ri  )  ,xyj]  ■*-  ppf(x,y)  Pa  [x-^  f  ( 0 ,  y  J] 


B:  qiC^2y  +  f  (x,  ( l-r?  )yj]  +  q^f  (x  ,y )  -►  q,  [y-^f  (x  ,  ^  J] 


{2.6) 


Since 


f(x,0)  -  pi[|r,x  +  f  ({1-ri  )x,o[|  pp  f(x,0)  +  PaX,  (2."^) 


we  have  setting  f(x,0)  -  CiX, 


f(x,o)  - 

'  '  1-P2-P1  (1-ri 


(2.8) 


and  a  corresponding  expression  for  f(0,y).  Tne  equation  In  (2.6) 
reduces  to 


f(x,y)  -  Max 


A:  c^^x  c^^y  +  f  (  ( 1 -r^  )x  ,y) 


B:  c^gX  +  Cpgy  +  q^ f (x , ( 1 -r^ ) y ) 


(2.9) 


P-4^-5 

-7- 


uslng  (2.^)  and  thp  ''orr^^nponilng  expression  for  f(0,y)  and 
solving  for  r(x,y)  w^iere  are  readily  "determined  positive 

constants.  The  treatment  of  (2.9)  1  nf  th»»  same  order  of  difficulty 
as  that  of  ( 1  .  1  ) . 

Let  us  now  derive  (l.7).  Consider  the  same  m^del  as  above 
In  §1  and  assume  that  we  wish  to  maximize  the  expected  value  of 
^(R)  where  ^  Is  a  given  function  and  R  Is  the  total  return  ob^-alned 
before  the  mining  machine  is  damage"’. 

Petting 


P(x,y,a)  «  expected  value  of  ^(R)  obtained  when  A  has  x  and  (2.10) 
B  lias  y  with  an  amount  a  already  mined,  using  an 
optimal  policy, 

we  obtain,  via  thf*  same  argument  as  above,  *  he  functional  equation 
of  (1.7). 

Existence  and  Uniqueness. 

Our  first  result  Is 
Theorem  1  .  Consldp»r  t  hp  e  quat  1  :)n 


d'(p)  -  Max  I  E;,(P)  ^  hj^(p)r(T^p)  , 
1  <K<m  J 


(3.1) 


wh^re  we  shall  assume 


(a)  Tne  poln*  p  Is  r'^strlcted  *0  a  region  R 
with  The  property  Fha t  p^R  I m f 1 1 p s  Hiat 

I  ( r  )  I  <  C  1  Cor  peR 
I  h^  (M  I  <  C2  <  I  [or  p^R. 


(b) 
(c  ) 


(^.2) 


Under  these  conditions  thcrre  Is  a  unique  bounded  solutijn  to 

(j.i). 

Proof:  Let  f^Cp)  tie  an  arbitrary  bounded  function  for  pdR.  Define 

J  »  n-0,1,2,*--.  (3-3) 

Let  k  -  k(n),  dependent  also  upon  p,  be  a  value  of  k  which  fur¬ 
nishes  the  maximum,  then 

Vl(P>  *  «k(n)(P)  Pk(r,)<P)‘'n('^k(n)(P>> 

^8k(n-l)(P)  ^  Pk(n-l)<P)Pn(\(n-l)<P))' 

and  similarly 

f'n{p)  -  Pk(n-1 )  ^P^''n-1  ^\(n-l )  ^ 

(?-5) 

2:  8k(n)(P>  ^  Pk(n)(P)Pn-l<'’k(n)(P>)- 

From  these  relations  we  obtain  for  n  ^  1, 

Pn+i<P)  -  ''n(P)  >Pk(n-l)(P>  [''n('’k(n-l )  (P> '-'’n-l '’^kin-I )  <P>^ 

<Pk{n)(P)  [Pn(\(n)(P)-Pn-l(^k(n)(P)] 

Let  us  define 

%  -  Sup  |fn(p)  -  fn-l(P>i- 

n 


(5-7) 
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UPlnp  the  bound  given  In  (^.2c)  we  ob  ain  from  (l.^)  the  result 


f„{p)l 


<  02%. 


(?-8) 


whence  u  .  <  • 

n+1  —  2  n 

which  means  that 


oo 

This  shows  that  the  series  ^  u^  converges, 

n-1 


oo 


(3.9) 


converges  uniformly  for  p4R.  Hence  fj^(p)  converges  uniformly 

as  n  - >  oo  to  a  function  f(p),  a  solution  of  the  functional  equation. 

To  establish  the  uniqueness  of  a  boun'-ed  solution  we  proceed 
similarly.  Let  F(p)  be  another  solution  of  (l)  and  let  k  be  an 
Index  which  yields  f(p)  and  m  be  an  Index  which  yl€*lds  F.  Then, 
as  above 


^'(p)  "  ^  ^  nj^(p)f;T^p)  >  K^(p)  +  hj^(p)f(T^p)  (3.10) 

F’(p)  -  F^(p)  +  h^lo)F’T^p)  >  g;^(p)  +  h^  (p)F  (Tj^p) , 

whence 

mP*  I  ^  (3.11) 

kP>l  )• 

If  we  set 

S  -  Sup  I f (p)  -  F(p)  I  ,  ( ^.  12) 

R 


F ( P j  -  F ( r ) I  <  Max 


l%(p)l  U(T^P)  -  ^•’('1 
hk(p)  I  I  r  r^^p)  -  F('; 
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we  obtain  from  (3. 11)  the  Inequality 

I  r(p)  -  F(P)I  1  OaS.  (J.IJ) 

If  we  take  p  to  be  a  point  for  which  |f(p)  -  P'(p)l  ^  S  -  6,  £  small, 

we  obtain  a  contradiction,  unless  S  -  0.  This  establishes  uniqueness. 

Let  us  observe  that  the  uniform  convergence  demonstrated 
above  establishes  the  further  result 
Theorem  2.  Under  the  conditions 


(a)  ^  ^k^P^  are  continuous  functions  of  p  in  R  (3.1^) 

together  with  the  previous  conditions,  f(p)  la  a  continuous  function 
cf  p  in  R. 

Furthermore,  if  and  ^ij^(p)  are  continuous  functions  of 

a  set  of  parameters,  q,  f(p)  will  be  a  continuous  function  of 
these  parameters. 

§4 .  Alternate  Proof  of  Existence. 

We  have  In  the  preceding  section  discussed  the  problem  purely 
from  the  analytic  standpoint  without  regard  for  the  underlying  pro¬ 
cesses.  Let  us  now  discuss  the  problem  with  regard  to  the  basic 
process,  and  consider  the  process  where  only  N  stages  will  be 
allowed.  If  we  define,  similarly  to  (2.1),  ^’j^(p)  to  be  maximum 
return  for  N  stages,  we  obtain 


f  i^'p)  -  Max  (gj^(p) )  , 


(4.1) 
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and ,  generally , 


(«l^(p)  +  hj^(p)rj^(Tj^p)) 


(^.2) 


Let  u8  now  assume  that  gj^  Is  actually  a  non-negative  return  and 
that  hj^(p)  Is  a  probability. 

It  Is  clear  then  that  ("2  ^  fi  and  thus  generally  that 
(P)  k  ^n(p)  •  set 


"  ‘-'>up  fj^fp)  , 
R 

we  obtain  from  (4.2), 


Vl  ^  (4.4) 

which  means  that  <  Ci/(l— Cg).  Since  the  fj,^  ar^*  uniformly  bounded 
and  monotone  increasing,  f^^  converges  to  f(p),  a  solution. 

^3 •  Approzlaation  in  Strategy  Space. 

The  functional  equation  discussed  In  the  previous  section 
effects  a  transliteration  of  a  decision  problem  from  the  space 
of  policies,  strategies,  schedules,  etc.,  to  the  space  of  functions. 
This  is  its  principal  role. 

The  essence  of  the  previous  section  was  tnat  an  initial 
guess  in  function  space  will,  ty  the  process  of  sjccesalve 
iteration,  eventually  yield  an  arbitrarily  close  approximation 
to  the  actual  solution. 
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We  may,  however,  Instead  of  guessing  an  Initial  function, 
guess  an  Initial  strategy  S.  For  example,  we  may  divide  the 
region  R  Into  m  sub-regions,  Rj  ,  R2,'‘',Rj^,  possessing  only  boun- 
dary  points  In  common,  and  choose  the  k  choice,  l.e.,  set 

f3(p)  -  g^Cp)  +  \(p)fs{T^p)  ('^•D 

whenever  P€R^-  For  the  points  on  the  boundary  of  two  ^r  more 
regions,  we  choose  either  Index. 

If  P€:Rjj*  the  transformed  point  '^'^p  will  belong  to  ,  where 
I  may  or  may  not  equal  k.  In  any  case,  continuing  In  this  way, 
we  can  calculate  an  approximation  to  f(p)  f„(p),  which  we  can  tnen 
Improve  by  successive  approximations  as  before. 

The  Importance  of  this  procedure  lies  In  the  fact  that  the 
convergence,  under  the  assumptions  of  the  preceding  sectl-)n,  win 
always  be  monotone.  This  is  of  great  Importance  in  practical  appll 
cations . 

To  show  this  monotonicity,  let  f2(p)  be  the  second  approxima¬ 
tion.  Then 

falp)  -  Max  Cgj^(p)  + 

1  <k<m 

Comparing  (5.1)  and  (5.2),  It  Is  clear  tnai  f2(p)  >  fo(p)' 

““  O 

Prom  this  Inequality,  It  follows  Inductively  that  ^|^^|(p)  ^  l'fj(^)- 
A  further  discussion.  In  connection  with  an  equation  of  different 
type,  will  be  found  In  . 


f" 

1 


4  .  The  Solution  of  ( ’  . 1 ) . 

We  shall  prove 

Theorem  *>.  Consider  the  functional  equal! 


r 

A  : 


r(x,y)  »  Max  { 


:  i 

^  PkCs"  y)3 


N 


V 


where 


J 


(a) 

(b) 


^  0.  ^  ^ 


y 


Pk  <  1. 


Tu  <  U 


1  ^  Ck’  =1 


C, 


d.’  +  d,  -  1  , 
K  k  ’ 


(c)  x,y  >  0. 
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(6.1) 


(6.2) 


The  optimal  ch-^lce  of  operatlor-  Is  ihe  following:  If 


(6.^) 


choose  A;  If  the  rev<^rse  Inequality  holds,  choose  B.  In  case  of 
equality,  either  cholc»^  Is  sat  1  s  f  ac  t  jry . 

To  simplify  the  notation  and  the  algebra,  let  us  consider 
first  the  simpler  f'^rm  cf  (6.  1  )  glv^n  by  ^2.^)  and  assume  ‘hat 
pT  ■  Is  ■  0.  Thp  ro<^ultlng  p qua  t  Ion  1  .s 


A;  p,[x+f(0,yj]  +  Pz  [>1  x+f'(  ( 1-ri  )x  ,y  [] 

f(x,y)  -  Max 

/  B:  qi[y+f(x,0[]  +  q?  [>27+ f  (x  ,  ( l-Pg  )  y  J] 
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Aa  noted  above,  we  already  know  from  Section  ^  that  there  Is  a 
unique  solution  to  this  equation.  Let  us  turn,  then,  to  a  discus¬ 
sion  of  some  of  the  simpler  properties  of  f(x,y).  Since 
Pi  +  Pa  <  1 1  qi  qa  <  1>  it  follows  that  r(0,0)  -  0.  From  the 
fact  that  f(kx,ky)  and  kf(x,y)  satisfy  the  same  equation  for 
k  >  0,  It  follows  that  f(kx,ky)  -  kf(x,y),  f ^r  k  >  0.  Seating 
y  -  0  and  using  f(rix,0)  -  rif(x,0),  we  obtain 

'A:  (Pi  +  Parjx  +  p?  ( 1 -r,  )  f  (x  ,  0) 

f(x,0)  -  Max 

[b:  (qi+qajf (x,0)  -  l^.h) 

-  (pi'*-Parj)x  +  Pa  ll-ri  )f  (x,C) 


whence 


f{x,0)  -  -  (6.6) 

Q-Ps  (l~r,  J] 


and,  similarly, 


f(o,y)  -  ■lai-^ag-rg.zz  .  (6.7) 

0  — Qp  ( 1  -rp  Q 


These  results  are,  of  course,  obvious  If  we  consider  the  pro¬ 


cess  generating  the  function.  On  these  grounds  we  should  also 
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;;usppc'.  Ihav  A  wouL^  ne  employe-i  wh^nf^ver  y  was  sufriclently 
small  aomp:ir<^i  wi'h  x.  This  fact  follows  fr  m  the  con’lnulty  cf 
^’(x,y)  (jompai'e  Section  3)i  since  tne  1  op  1 1  ty 

f'(x,y)  >  (^1  +  T2r2)y  +  q,f(x,0)  f  is  ^  ( x  ,  ( 1 -ra  )y )  (6.8) 


must  hold  for  small  positive  y  -  y(x),  ^or  x  >  0,  since  it  is 
valid  for  y  -  0. 

It  roll:)ws  that  there  are  two  rec.lons,  close  to  the  x  ani  y 
axes,  in  which  tne  optimal  choices  arp*  ,  respectively,  A  and  B,  when¬ 
ever  (x,y)  is  contained  in  either  of  thp^se  regions,  as  shawn  in 

Fig.  2. 

It  is  reasonable  to  suppose  that  the  solution  has  the  form 
shown  in  Fig.  1.  Ihe  meaning  of  Fig.  1  is  that  A  is  employed  when¬ 
ever  (x,y)  is  in  ,  the  region  between  the  x— axis  and  L,  and  B  is 
employed  in  tne  complementary  region.  hn  the  line  L  either  A  or 
B  may  be  usei. 


Fig.  2 


That  the  bounlary  2urve,  i ''  it  exists,  must  be  a  straight  line 
follows  from  tne  bomogenel’y  of  f(x,y). 


Assuming  that  the  solu*^ion 
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has  this  form,  we  shall  show  that  the  equation  L  may  cal¬ 
culated  Trom  the  fact  that  It  Is  an  indifference  curve.  By  ‘his 

we  mean  that  for  points  (x,y)  on  the  curve,  the  value  of 
function  f(x,y)  is  the  same  whether  we  employ  A  or  B. 

Observe  that  the  effect  of  employing  A  Is  always  to  drive  P 
Into  Rg,  whereas  the  use  of  B  sends  P  Into  .  Consequently,  If 
A  Is  used  at  P,  the  next  choice.  In  an  optimal  policy,  must  be  B, 
and  vice  versa  If  B  Is  used. 

This  alone  would  not  be  sufficient  to  determine  L,  were  It 
not  for  another  fact.  Since  the  operations  A  and  B  operate  on  x 
and  y  alone,  there  will  be  a  certain  symmetry  In  the  results  obtained 
by  using  A  and  then  B,  or  B  and  then  A,  which  plays  a  decisive 
role  In  the  solution. 

Let  us  now  do  a  small  amount  cf  computing.  Using  the  values  cf 
f(x,0)  and  f(0,y)  obtained  above,  we  have 


f(x,y) 


Max 


B: 


(Pi  +  P2r,)x  +  .  P*r((l-r,)x,y) 

Q  _q2  ( 1  -rg  J] 

(qi+q8r2)y  +  +  q?  f  ( x  ,  ( 1 -r?  ) y ) 

[i-P2(l-r, 


(6.9) 


To  simplify  the  notation,  let  us  denote  the  coefficients  of  x  and  y 
in  the  above  equation  by  Qj  ,02  In  A  and  by  Bi ,02  In  B.  If  we  employ 
A,  we  obtain,  using  an  obvious  notation. 


r^(x,y)  -  a,x  +  ogy  +  P2  f  { ( 1 -r,  )  x  ,y ) 


(6.10) 


■V 
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Fol  lowing  this  by  B,  wr 


t'iP2(l-ri  r]r.  +  ( Og  +  F2P2)y 


+  P2q2r( ( l-ri )x  ,  ( l-r2 )y ) . 


(6.11) 


rimllarly,  Ihp  result  of  B  and  ther.  A  Is 


f'BA(^'y)  “  ^  q2ai)x  +  Ifz  q2i2(l-r20y 

+  p^q?  r( ( l-r,  )x  ,  (l-ra )y) . 


(6.12) 


If  (x,y)  Ups  upon  L,  wp  must  have  -  ^BA  ’  Equating  the 

two  expressions,  we  observe  mat  tne  unknown  function  f ( ( 1  — r >  ) x , ( 1 -r? ) y ) 
disappears.  Cmsequently,  we  'btaln  for  L  the  equation 

[ai(l-q2)  +  1  (p2  ( 1-ri  )-l  rjx  -  L‘2  ( q2  ( 1 -ra  ) -1  )  + 12  ( i -p2  []  y  (6.n) 

Using  the  precise  valuer  of  ii,i2»’i.b2  given  by  (b.9),  we  finally 
obtain,  as  the  p^ua'l-^n  if  L, 

(  p  I  ep2  c  ^  ^x_  ^  (  q  m  q2  i  i  '  y  .  1  4  ) 

l-Pi-l2  1-11^2 

'^hlr  Ir  a  r-narkab. ly  slrnpl'^  equntl  n,  sln  'p,  as  wp  rb  umve  , 
the  coefflclpH"  of  x  depends  only  'n  *  he  A  operation,  wnlle  t..e 
coefflclen"  of  y  iepends  only  on  t‘.p  B  operatl  in.  Fur’ hermiore  ,  each 
coefflplen’  -ini's  fa  /e  ry  s  Ir.pl  p  1  r;  ’  e  rp  *  a  *  1  n  as  the  ra  1 1  o  of 
the  p  x  pp  cted  ylell  rf  '  he  opp  r  -t  ‘  1  'O  ‘  o  t  he  p  r  ib  a  b  1 1 1  ’  y  of  t  p  rm  1  na  t  1  on 


of  the  process. 
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Let  us  Insert  a  word  of  warning:  Although  this  elegan* 
result  holds  for  some  generalizations  of  the  functional  equation, 

It  does  not  hold  In  general,  as  we  shall  subsequently  see. 

Let  us  now  prove  that  the  solution  actually  has  this  simple 
form.  To  make  the  previous  argument  rigorous,  we  observe  that 
below  L,  the  procedure  consisting  of  A,  B,  and  an  optimal  contlnua- 
^  tlon  is  superior  to  B,  A,  and  an  optlnal  continuation,  and  that  the 
reverse  is  true  above  L.  Referring  to  Fig.  2,  let  be  a  point 
above  the  known  A— region,  and  far  enough  below  L  so  that  any  out¬ 
come  of  a  B-cholce  transforms  Q(x,y)  into  the  known  A-reglon. 

To  show  that  A  Is  used  at  Q,  we  argue  by  contradiction.  Sup>— 
pose  that  B  were  used;  then  the  next  choice  would  necessarily  be  A. 
However,  we  have  seen  above  that  below  L  the  procedure  consisting 
of  B,  A,  and  an  optimal  continuation  Is  Inferior  to  A,  B,  and  an 
optimal  continuation.  Hence,  A  Is  used  at  Q.  It  Is  clear  that  we 
may  continue  this  argument  until  we  have  demonstrated  that  the 
region  between  L  and  the  x— axis  Is  an  A— region.  Similarly,  start¬ 
ing  from  the  known  B— region,  we  may  demonstrate  that  the  region 
above  L  Is  a  B— region. 

We  have  carried  through  the  proof  for  the  simplest  case  of 
(6.1).  There  Is  no  difficulty  In  verifying  that  the  argument  Is 
general . 

Geometrically,  the  pattern  Is  as  follows:  When  (x,y)  Is  In 
,  A  Is  employed  until  the  resultant  point  Is  In  R^,  at  which  time 
B  is  employed  until  the  point  is  again  In  R^,  and  so  on. 


P-'4  3'5 
-19- 


^7  .  A  Gf>  np  rail  7  q  t  i  on  . 

1?  nc  <iirri2ulty  In  pxt(“r.'ilng  he  ab'^ve  analysis  to  the 
following  n-d Imens Iona  1  equation 


r(x, 


Xg 


n 


K 


Max 

1 


y  c  jlfJ'l+f'C!'!  ,It2  ,  •  •  ■  ,c  , 


(7,1) 


whero 


'n 


K 


(a) 

^ik  >  0.  ^  Plk  <  1-1, 2. --'.n, 

K-1 

(b) 

'  ^  =Uc  ^  "  =ik  - 

(7.2) 

(c) 

Xj  > 

The  declsl'n  functions  are  again  the  ratios  of  ^xpocted  gain 
to  probability  of  termination,  namely, 

^  Pik=ik 

Dj(x)  -  -  X,  (7.1) 

If  Max  D^(x^)  Is  attained  for  1  -  L,  ’’hen  the  choice  Is 

made  unless  there  Is  equality,  In  which  case  any  one  of  the  itaxl- 
ml.^lng  choices  Is  optimal. 

p 9  .  Tr«^  Form  of  f  ( x  , y  )  . 

Having  obtained  a  \ery  simple  cnarac terlzat ion  of  "he  optimal 
policy,  let  us  now  furn  our  a-t^ntlon  to  "he  function  f(x,y).  In 
general,  no  simple  analytic  pr**  s»n"  a  *  1  on  will  If,  however, 


we  consl'ler  p'qua-lon  whl^'h  wo  wrl'e  again  as 
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f'(x,y) 


Max 


a,x  +  aay  +  P2f(c2X,y) 
+  Pay  +  qaP(x,i2y) 


(c2-l-ri ,  dg-l-ra) 


(8.1) 


we  shall  show  that  If  Ca  and  d*  are  connected  by  a  relation  of  the 
type  Cg  -  d2»  m  and  n  being  positive  Integers,  we  shall  obtain 
piecewise  linear  representations  for  f(x,y). 

It  Is  sufficient,  in  order  to  Illustrate  the  technique,  to 
consider  the  simplest  case,  C2  -  d2. 

Let  (x,y)  be  a  point  in  the  A-reglon.  If  A  Is  applied,  either 
(j^»y)  goes  Into  (C,y),  In  which  case  B  Is  used  continually  there¬ 
after,  or  it  Is  transformed  into  (c2X,y),  which  may  be  In  either  an 
A-  or  a  B-reglon.  Let  Li  be  the  line  that  is  transformed  Into  L 
when  (x,y)  goes  Into  (c2X,y),  let  L2  be  the  line  transformed  Into 
Li ,  and  so  on.  Similarly,  let  Mi  be  the  line  transformed  Into  L 
when  (x,y)  goes  Into  (x,d2y)»  and  so  on.  In  the  sector  LOLi ,  A 
Is  used  first,  followed  by  B,  as  shown  In  Fig.  3- 


Hence,  lor  (x,y)  In  this  sector  we  obtain 


% 
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4 

j  ^(x.y)  -  a,x  +  Qay  +  P2f(c2X,y) 

,  -  QiX  +  asy  -t-  P2  (^t  C2X4  Pey )  4  Pz  qp  f  ( C2X  ,  C2y )  (B.2) 

-  (ai+ppP,C2)x  4  (a2  +  Ppf-2)y  4  P2q2C2f(x,y) 

This  ylplds 

f(x,y)  -  (^i-»-p2:  1 -2  )x  4-  (a24-pgP2)y  (8.3) 

1  -  FzQzCp 

for  (x,y)  In  LOLi .  Similarly,  we  obtain  a  linear  expression  for  f 
In  LOMi .  Having  obtained  the  representations  In  these  sectors,  It 
Is  clear  that  we  btaln  linear  expressions  In  LiOLp,  etc. 

§9.  The  Pr’b’em  fc^r  a  Finite  Number  cf  5’ rages. 

Let  us  now  consider  the  problem  that  arises  when  tnly  a  finite 
number  of  stages  ar*^  allowed.  If  we  set 

^xi(x,y)  -  expected  T'eturn  using  an  optimal  N— stage 
policy, 

then 


filx.y)  -  Max  L(P'  +  P2  =  0x,  (qi  +  laiOyH 


(9-?) 


:  Pl[x4r  (P.y)]  4  P2  [ci^-*-^M(c2X,yy] 

V.IX  ^  f 


Wf  know  from  the  rer.ul's  CMicernlng  pxls'encp  and  un1.queness 


In  Section  ^  that,  as  N  — >  oo,  rj^(x,y)  - >  r(x,y).  However,  1' 

Is  not  reasonable  to  :’>uspect  tfiat  for  each  N  ‘he  ptlmal  pi  Icy 
will  be  that  of  f(x,y).  Furthermore,  It  Is  clear  that,  In  yeneral, 
the  policies  will  no’’  be  the  same  for  N  -  1. 

It  does,  however,  follow  from  lur  previous  ar^umen t a t 1  on  ‘hat 
If  for  some  N  the  decision  regions  f  l'j,j(x,y)  and  f(x,y)  coincide, 
they  must  do  so  for  all  larger  N. 

Let  us  now  show  that  decision  region:'  for  f  convprirp  ‘iward 

N 

that  of  f  as  N  - >oo,  and  ‘hat  there  will  always  be  an  N  ^  with  ‘he 

property  that  ''f  N  ^  ’’he  regions  will  colnolie. 

The  proof  Is  very  simple.  Consider  the  situation  f  ir  N  - 
as  In  Fig.  4. 


M_g.  4 

Le’  L2(A  " )  if>riotP  the  line  that  Is  transform'll  In’  ;  L2  w-iPn 
(x,y)  goes  Into  (cx,y).  Le  ’  ;  he  In  tne  spctor  between  L2  arrd 

L2(A~^).  If  A  Is  us<^d  at  Q,  ‘  nen  P  is  used  next,  since  •he  ‘rans- 
formed  point  Is  In  ‘he  Rg— region  for  N  -  1  f  I  s  at)''ve  L,  we 
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know  that  AD  1  .a  Inferior  t -)  DA,  »  QS  a  ae  l  of 

rira‘  ’WO  ’Dc'lc^^s.  B  la  us'“i  at.  ,.  This  sh  ws  us  th"-’  the 

w_rpp-lon  for  *  hf»  N— stOi^'e  iroa-'a:'  I  r  at  If'apt  that  tantalnlng  *  he 
oPClor  b  ■un(i»‘d  bv  *  y-^yls  ani  L?  ( A  ''^  )  .  "hi?  prjceas  cmilnu^s 
until  L^(A  M.  f^n  som<^  k,  ller  below  L  ,  w*;lcti  must  necessarily 
'Ccur  aftar  some  flnl’*’*  numb'^r  of  stao^r. 

Th*^  argument  1?  general  and  applies  to  the  general  equa’^lons 
dlsctiss^d  "b,vp.  How<=ver,  we  canno*  arsef  that  ‘he  convergence 
In  monotone,  as  we  suspect,  un‘ll  we  knjw  more  about  the  A—  and 
B— regions  for  the  N— fttage  jir  c-^ss.  It  Is  probably  true  that  there 
are  two  regBrir  f  r  each  N' ,  but  this  Is  a  result  that  has  only 
be.^n  demcnstratej  In  the  case  ^f  the  simple  eq^jatlon  (7.1), 

To  show  ‘his  re<5ult,  we  use  ‘he  rac‘  *ria‘  ’■nls  equa  t  1  on 'arl  ae 
fr ')m  a  model  in  which  tne  results  of  an  operation  are  known  only 
as  far  as  the  expe:’pi  ou‘.  c)me  Is  c'-ncernel.  Any  N- stage  policy 
h'lo  tlie  farm,  ther*:'  r*"’ , 


M  b  1 

*  \ '  * 


(9.M 


where  tne  e  .  t  ,  ire  >r  i  sltlve  In'e/prn.  Tnls  nota‘lon 

means  tha’’  ‘he  A— cnrlce  1.‘  made  -t ,  j  (o.se  -  ;  *  1 '/e  ’Imes,  “hen  ‘  ne 

B— onol^e  :  ^  '  anr^'  ;*  1  ve  times,  and  s'  u'..  !nere  are  now  tw' 

ca:e-;  :  1,-  o'. ‘rier  equal  •  A’"  cr  3‘  ,  r  1‘  tias  ti.e  form  A  3‘  ‘  * 

I 

or  B  A • • •  ^  whe  re  k ,  [  n  N . 

Referring  ‘o  Pig.  ■<  ,  consider  a  „  above  L.  If  an  opti¬ 

mal  peii'^y  h'l^  the  f  orm  A^B' ’  '  ,k  <  N,  whlc.h  may  be  written 
A^”'(AB'‘-',  It  may  be  Inpnvod  by  replacing  AB  with  BA,  since  A 
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Iterated  any  number  of  times  maintains  Q  above  L.  It  follows  then 

that  In  the  region  above  L,  either  B  Is  used  first  or  A  Is  used 

repeatedly;  and,  similarly,  In  the  region  below  L,  either  A  Is  used 

first  or  B  Is  used  repeatedly. 

N 

Since  A  Is  clearly  the  optimal  policy  for  points  sufficiently 

N 

close  to  the  x-axls,  and  B  Is  the  optimal  policy  for  points  suf¬ 
ficiently  near  the  y-axls,  It  follows  from  the  analytic  form  of 

the  yield  for  any  Sj^ — an  expression  which  Is  linear  In  x  and  y — that 
N 

If  A  Is  used  at  Q,  It  Is  used  for  all  points  below  the  line  OQ, 

and  similarly  for  B^,  ”below"  being  replaced  by  ’^above." 

It  follows  that  there  are  always  two  regions,  separa’^ed  either 

N  N 

by  AB  -  BA  or  by  a  line  of  more  complicated  form.  If  A  or  B  are 

N  N 

still  dominant.  For  large  N  It  Is  clear  that  A  and  D  become  less 
and  less  Influential,  so  that  eventually  AB  •  BA  emerges  as  the 
sole  dividing  line. 

^10.  A  General  Utility  Function. 

We  have  In  the  previous  sections  considered  only  the  case  in 
which  the  utility  of  a  total  yield  z  was  proportional  to  z.  Let 
us  now  turn  to  the  more  Interesting  case  In  which  the  utility  Is 
measured  by  a  function  The  basic  equation  Is  now 

A  :  Pif(0,y,a+x)+p8  f  (c2X,y,a-»-Cix)-»-p3Ha) 

f(x,y,a)-Max  (10. 1) 

B:  qif{x,0,a+y)+q2f(x,d2y,a+diy)+q3Ma ) 

f(0,0,a)  -  Ma) 

where  Ci  +  C2  -  1,  di  +  da  -  1,  q^  ^  0, 

Pi  p2  +  p3  "  qi  -t  qp  +  qa  -  1 . 
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Thls  f‘quatlon  1?  more  Ufflcult  to  treat  of  than  that  occur¬ 
ring  for  ^>(7)  -  7,  anh  we  thall  only  be  able  to  present  Its  solu¬ 
tion  for  certain  classes  of  functions. 


We  have 


P(0,y ,a ) 


r 

I  A  :  p  ,  f  ( 0  ,  y  ,  e  )  +  p?  f  ( 0  ,  y  ,  a  )  Pa M  a  ) 

Max  I 

iB:  q,  f  (0,0.r.4’,  ,  +  q?  f  (  ^  y  ,  a4 1 ,  y )  +  qs^a) 


(10.2) 


Since  f(x,y,a)  ^  f(0/),a)  -  |:(a)  for  x,y  ^  0,  with  strl:’  Inequality 
If  X  or  y  Is  positive,  It  follows,  since  p  1 +P2-*-P'3“h  P^  >  0,  tnat 


f(0,y,a)  -  qiMa+y)  +  QaMa)  +  Qa  ^'(  0  ,dpy ,  a+d  ,  y  )  (IO.3) 


and,  similarly,  that 

f(x,0,a)  -  p,(('i  +  x)  +  PsMa)  -*■  P2f(c2X  ,0,a  +  c  ,  X  )  .  (10.4) 

For  glv^n  i,  ’nes<^  equations  may  now  be  solved  by  Iteration 
f  ir  the  functions  f(r),y,a)  and  f(x,0,a). 

Let  us  In  prc'^eod  formally  bef  re  turning  ’o  a  Justification 
of  our  ope  ra '  1  ■'»ns  .  It  1  clear  from  •^^he  conserva'lve  nature  of 
the  processes  Involvei  that  ‘he  quantl'y  x  +  y  4  a  remains  constant 
throughout  t.ne  pe'pjonce  of  oferatlons.  Consequently,  the  effect 
of  any  choice  is  to  'ransform  a  point  In  the  region  R:  X4y4-a«c, 
x,y,a  ^  0  Int  c  anot.ner  { olnt  In  ^  he  region,  as  shown  In  Fig.  5- 

The  problem  that  confronts  us  Is  ‘hat  of  determining  the  set 
of  points  In  R  In  wnlcn  A  Is  used  and  ‘he  se‘  In  which  B  Is  used. 
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If  we  assume,  as  before,  that  these  setr  constitute  connected 
regions  having  a  boundary  curvA  p,  we  may  proceed  to  find  the 
boundary  as  before,  using  the  fact  that  the  bo’uidary  la  an  Indif¬ 
ference  curve . 

a 


However,  we  must  assume  more  about  the  boundary  curve  *^han 
previously,  where  the  fact  that  It  was  a  straight  line  resulted 
In  considerable  simplification.  Let  us  assume  that  the  result  of 
applying  A  to  a  point  P  on  the  boundary  curve  Is  to  transform  It 
Into  the  B— region,  and  vice  versa. 

Having  provided  ourselves  with  a  cushion  of  assumptions,  let 
us  now  go  through  the  calculations.  If  A  Is  employed,  we  obtain 

f(x,y,a)  -  p,f(0,y,sfx)  +  p?  f  (  c?  x  ,y  ,  a+:  i  x  )  +  r3i>('0  (IO.5) 

Employing  B  at  (0,y,a+x)  and  (cax ,y  ,a  +  c i x ) ,  we  obtain 

f(x,y,a)  -  Pi  [gi  i>(a+x+y)  q2  f  ( 0  ,d2y  ,a+x+d  1  y )  +  q3Ma-*-x[] 

+  p2  [S»  f  (c8X,0,a-t-c  ,  x  +  y  )  +  qz  f  ( CjX  ,dpy  ,  a^-c  ,  x+d  1  y  )  (IO.6) 

+  q3<>(ad-cix]r] 


P3^(a). 
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A  similar  “Xiresslon  Is  obtained  by  using  P  and  then  A.  Equating 
the  two,  we  obtain,  the  equation  of  boundary  curve. 


PiQ3i)(i  +  x)  +  D2q^^(a+cix)  -t-  p3i(a) 

=  q,P3i(&  +  y)  +  qplaH^  +  ^iy)  +  q3i(a) 


(10.7) 


which  may  be  written 


Piq33(afx)-i(a[]  *■  paq  3  Q)  (a+c  ,  x )  -  i:(a)^ 
-  QiPa  +  y  )-^(a  □  t  q2p:,  3(a  +  '>iy) -Ma]j 


(10.8) 


In  order  to  establish  the  result  rlg^r  u-Iy,  w"  rrust  a3C‘'‘r"aln 
whether  or  not  the  boundary  curve  has  the  desired  transformation 
[ ropert  y . 

What  we  actually  require  Is 
Property  ^ 

P’(x,y,a)  -  ;  ,  q.,  ,[5: 'g^x  )-i  (  .  )]  4  1  2 qa  [} ( <a+ '  1  ^  ^ ^  E 

(10.9) 

-  ii P3i3(='-^y )-i  -  q2p3l5(^‘*- •  ly )-4('’D  <  o 

then  F  (  Cp  X  ,  y  ,  a +  c  1  X  )  <  If  F(x,v,a)  >  0,  hen  F(x,d2y.a+d,y)  >  0. 

Unfortunately,  1’'  ee<»ms  r  n  b^  il^fl-'ult  ’o  pres^rd  any  slmpl'^ 

criterion  watch  will  Insu’-e  th'-'*  a  i'f^nera  .  urll.  v  fun''’’ton  1(7.) 
will  sa*'lsrv  ?rop'»r‘y  .  I*  Is  no*  dl^fl''Ui’  '  ^  ''new,  f  r  example, 

that  ^(7)  -  Iocs  not  .na  *  I  ''y  1*  f  >v  all  values  of  and  q^  . 

t  us  n  )w  i cm  ^p. s  t  ra ’  e 
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Theorem  4 . 

(a)  ^(z)  jp  Ptrlctly  Increasing  and  cnn’ inuoug, 

<^(*)  ^  0,  (10.10) 

(b)  Property  "  is  satisfied, 

then  the  solution  to  ( 1 0 . 1  )  Is  given  by 

f(x,y,a)  -  p,f(0,y,a+x)  Ps  f  (cjX  ,y ,  a+c  i  x  )  -t-  P3^(a)  (ll.ll) 
for  F(x,y,a)  ^  0,  and  by 

f(x,y,a)  -  q,f(x,0,a+y)  +  q?  f  (x  ^dgy  ,a+d  ,  y )  +  (10.12) 

for  F(x,y,a)  ^  0. 

The  optimal  policy  Is  to  apply  A  when  F ( x , y , a )  >  0  and  B  ^ 
F(x,y,a)  <  C.  When  there  Is  equality,  It  is  a  matter  of  Indlff^^r- 
ence  as  to  which  choice  Is  made. 

Proof :  The  pr'of  Is  carried  through  In  twn  stages.  First  we  show 

that  there  Is  a  region  In  the  plane  x+y+a-c  wt.ere  A  Is  always  used, 
namely,  a  region  close  to  y-0.  "Ihen  we  consider  what  happens  at  a 
point  Q  In  the  region  defined  by  F(x,y,a)  2.  '''  x-t-y+a-c. 

Let  us  assume  for  the  moment  that  we  have  already  established 
the  existence  of  a  region  where  A  Is  always  used.  If  B  Is  used 
at  C,  It  follows  from  Property  T  thr.‘  ‘he  transformed  point  Is  again 
In  the  same  region.  It  cannot  be  true  that  B  Is  us'^d  repea’edly 
If  X  >  0,  since  eventually  the  y  coordinate  will  be  so  small  that 
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the 

p  o 1 n  r  will 

be 

In  the 

A— region.  Hence, 

If 

at  „  a n  - 

'P ‘  Ima  1 

pol 

Icy  empl ^ys 

B 

for  t;  he 

first  k  C'lolces, 

t  he 

segue  ce 

of  moves 

ha  a 

the  form 

r  -  DP  •  •  •  ( k  t  Ime  ?  )  ■  '  •  RA  .  (  1  't .  1  3  ) 

Or.  the  basl.=5  of  Property  "7,  we  ‘ire  silll  In  *  re^^lor 
F(x»y»Q)  x  +  y+a-G  after  employing:  P  (k-l)  tlrr.es,  7  he  nex* 

two  mover.,  B  and  then  A,  canno’  be  optimal,  nowever,  since  '"he 
region  Is  defined  by  ‘he  property  that  AP  pins  optimal  crn'lnua— 
tlon  Is  superior  ‘o  RA  plus  optimal  cont Inua ‘  lor.  his  shows  ‘hat 
at  7,  m.ove  p  cannot  be  used  firs'  In  an  op‘ Imal  policy. 

It  remains  then  to  es’ablls.h  t  ne  ‘^xls'ence  -'f  me  A-reglon 
men'loned  above.  .''Ince  f(x,y,a)  >  f'T  ^*y  ^  0  end  one  a‘ 

least  positive,  It  follows  that 


p,  f  (0,y  ,a-»-x)  +  P2f(cjx,y,a+c,x)  +  PTf(a) 
>  iif(x,0,a  +  y)  02  f  (x,dpy  ,a+d,y  )  q^Ma) 


(10.14) 


which  holds  a ’■  y-'^ ,  must  by  vlr'ue  of  ‘he  con‘lnulty  of  ‘he  func¬ 
tion-  1  nv ol  ved ,  f or  any  x  >  '' ,  hold  for  some  In’ervnl  ^  <  y  <  y(x,a). 

The  F.xponen  ‘  1  a  1  Utility  F'jnc  1 1  on . 

■^ne  way  of  ^btalnlny,  utlll’y  func'lons  ‘ha‘  nave  the  desired 
property  1  *  tD  make  the  '■■oundary  ‘=‘quatl'-'n  Indep'^nden*  of  a.  If 
we  wish  ‘-'.Is  ‘  n  he  true  for  all  /alues  of  the  ;.arame‘ers  and  qj^, 


we  must  have 
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<)(a+x)  -  ^(a)  -  G(x)H(a) 


(11.1) 


wnlch  yields,  using  .standard  arguments,  under  '  ne  assumption  of 
continuity. 


(a)  i>(z)  -niz+n  or 

(bj  ^(z)  -  ce^^  . 


(11.2) 


We  have  already  considered  the  first  utility  function:  let 
us  now  consider  the  second. 

The  Important  property  of  these  utility  functions  Is  that  a 
policy  which  maximizes  the  expected  value  of  ^(z)  proceeds  at  each 
stage  without  regard  for  the  amount  already  obtained,  being  depen¬ 
dent  only  on  the  remaining  amount  to  be  obtained. 

If  we  set,  for  b  >  0, 


g(x,y)  -  Max  Exp  (e^^) 


(n.3) 


("Exp"  denoting  here  "expected  value,"  not  "exponential"), 
we  obtain  for  g  the  functional  equation 


g(x,y) 


A:  Pie^^g(0,y)  +  P2e^''^  ‘  ^g(c2X,y)  +  pg 

B:  qie^^g(x,0)  +  q2e^^^g(x  ,d2y )  +  q3 


( 11 .  M 


a  special  case  of  Theorem  4,  we  obtain 
Theorem  5 •  The  solution  of  (11.4)  Is  as  f ollows :  For 

Pi  (<.*”‘-1)  ^  q,(e*^^-l)  ■.  qg(e^'^‘y-l) 

Ps  q3 


Ihis  requirement  of  continuity  can  be  considerably  wea'-<^»ned. 
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usf-  A:  If  ’he  revpr.^e  inP'’'jnll^y  ho]  Jr:,  ►^mpl  y  R ;  If  equal,  e  i  ^  her 
Is  a pr  1 1  ' abl . 

Observe  that,  as  should  tr'^.:e  ,  thp  limit  solu’l  ui  as 
b  - >  0  Is  exactly  tuat  ob:  alnec  from  H  ■’ )  E  ?- ■ 

^12.  AsyTnp*otlc  Behavior  of  r  (  x  ,  y  j  . 

We  now  turn  to  ’  ne  problem  of  de'ermlnlnp,  *  he  as^onptot  Ic 

behavior  of  g(x,y)  as  x  and  y  - ■>  oi.  We  be^ln  by  derlvlrig  the 

asymptotic  behavior  of  g(x,0)  and  ^•(A.y).  From  the  equation  we 
obtain,  for  large  x, 


f  bx  be , X  ,  ^  v 

g(x,C)  -  p,e  +  p.,  +  pae  glc^x,  ). 


(12.1  ) 


'hls  equation  may  be  solved  by  Iteration: 


g(x,^)  -  +  pn)  +  (12.2) 


To  obtain  the  asymp’  o‘ Ic  behavior,  t.owever,  we  must  pr  ceoi  dif¬ 
ferently.  Set 


;  X 


‘(x,^)  -  - +  h(x)e 


DX 


(12. A) 


whe  re  s  a  1 1  s  f  1  e  •  ne  e  -ja  ‘  1  o- 


^  \  —'X  V 

n^x)  -  r^e  +  p?:'.vC?x) 


(12.4) 


as  we  rep  >'v  direct  cut  s  1 1  tu  ’  1  on  .  1  *  n  u>'h  1’era‘lor,  yields 
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,  /  \  — bx  .  — bcpX 

h(x)  -  pae  +  P2P3e  ^  + 


(12-'^.) 


the  asymptotic  behavior  of  h(x)  Is  still  not  apparent.  We  shall 

_ p 

show  that  h(x)  -  x  t  )  D  •♦'O  ( 1  Zl  ^  ^  where  t'(^)  “  ^(cgx), 

a  -  (log  1 /p2 )/( log  1/C2 )  .  To  accompllah  this,  set  h(x)  -  k(x)x~^. 
Then  k  satisfies  the  simpler  equation 


k(x)  —  k(c2J^)  -  PiX^e 


(12.6) 


The  essential  fact  about  ^  that  we  shall  use  Is  ’h  t 
converges  for  each  x.  From  (12.6)  we  have 


i  (x/c?) 


k  (x/c?)  -  k(x/c5  -  Mx/c?)  (12.7) 


which  yields 


Llm 

n - >  no 


(12.8) 


From  the  form  of  the  limit  function  or  from  the  equation  for  k(x), 
we  see  tha*  <J(x)  -  t(c2x)  for  all  x.  If  then  v;e  write  y  -  x/c2 
for  1  X  ^  l/c2,  we  have 


k(y>  -  k(x/cS)  -  "  D+0(1  n<^(^/ca)  (12. 9) 


as  y  — >  oo. 

Collecting  the  previous  results,  we  see  that  ’he  asymptotic 
behivlor  of  g(x,0)  Is  given  by 
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Pie 


DX 


1— P? 


1  +  o{l)  I] 


(12.  n) 


where 


(a)  (l(x)  -  <J(c2x) 


(b)  a  1 


log  l/pg 

log  !/C2 


(12. n) 


The  corre rpond Ing  rerult  for  g(0,y)  ic 


by 

e 


by 


(1  o(i)) 


(12.12) 


where 


(a )  6 (y )  -  5 (d2y ) 

(b)  b,  -  log  l/qj/log  l/d? 


(12.1^) 


Turning  ‘o  th^  equation  for  g(x,y),  wf>  have  for  x  and  y  large 


g(x,y)  -  Max 


b(x4-y)  be  1 X  /  \  by  /  b,  V 

PiQi  .  ^  ^  ^  ^  g(c2X,y)  +  ^y  ) 

1  -^2 


b(z+y)  bdiy  , 

p  +  q2e  g(x,d2y)  +  0(e  /x'  ‘) 


— p2 


(12.14) 


'e  It  Ing  h  (  X  ,  y )  ^  -  g(x,y),  we  obtain 


b(x,y)  “  Max 


Pi  Ti 


T 


— p? 


+  P?a(c2X,y)  +  ^( 


-bx  -b , 

e  V 


♦  q2h(x,'"2y)  +  ^'^’x 


(12. 1-) 
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To  simplify  still  further,  we  set  h(x,y)  -  a  +  k(x,y),  obtaining 


a-»-k(x,y)  -  Max 


1^1-  +  ap2  +  P2l<(c2X,y)  +  0(e  ) 

~Q2 


+  aqg  +  q2k(x,d2y)  +  0(e  ^^x“®  *  ) 


(12.  ]6 


If  a  Is  chosen  to  be  the  common  solution  of 

^  +  P2  a  -  +  q2  a  (12.17) 

namely,  Pj q » /( 1-Pi  ) ( 1-qi  ) ,  (l2.l6)  simplifies  to 


l<(x,y) 


Max 


P2k(c2X,y)  +  0(e"^^y~^* ) 

q2k(x,d2y)  +  0(e~'^*^x"^*  ) 


(12.18) 


To  estimate  k(x,y)  we  use  the  fact  that  the  solution  may  be  obtained 
by  means  of  successl veapproxlma* Ions : 


Kn^.,(x,y) 


Max 


P2k^(c2X,y)  +  0(e  ) 

q2k^(x,d2y)  +  0(e“^^x“^M 


ko(x,y)  -  l/x^+y^  , 

(12.19) 


considering,  for  our  purposes,  only  values  of  x  and  y  greater  than  1. 
The  exponent  r  will  be  chosen  In  a  moment. 
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If  we  have  an  Inequality  of  ♦he  type  k|^(x,y)  <  u^/(x^+y^), 
being  a  conrtant,  which  lnequall‘y  la  certainly  valid  for  n-C, 
we  obtain 


^n+1 ^ 


+  0(e^^x~®'  ) 

J 


(12.20) 


Choore  r  jo  that  pjCg  ^  ^  l/2,  q2d2  ^  ^  l/2.  flnce  ai,bi  >  r, 
we  vBee,  since  x  e  1  ^  ^  1  ^ 

^  d^/(x^+y^),  for  x,y  ^  1.  Hence,  we  have 


-  1 
?  ""n 


Xj,+y 


02 


n 


r.  r 
X  +y 


r  r 
X  +y 


r,  r 
X  +y 


(12.21  ) 


for  some  constan’  n^.  If  we  lake  -  -^(u^+a?),  ’he  Inequality 

Is  preserved  for  u  Olnce  as  defined  by  the  recurrence  rela— 

tlon  is  uniformly  bounded,  we  obtain,  In  the  11ml'  ,  k(x,y)  < 

Knowing  the  form  of  ♦he  func’lon,  we  r^'adlly  obtain  ’he 
optimal  policy,  deriving  In  this  case  lt>  .ilgh’ly  paradlxocal 
result  that,  asymptotically,  as  x  and  y  — >oo,  It  makes  r'.o  dif¬ 
ference  which  move  Is  made  first. 

Collecting  the  above  results,  we  obtain 

P:(x,y)  -  ^ ^  ^  ^x^fy^  (12.22) 

(  1  — P2  )  (  1  —'I?  ) 


A  Continuous  Vorslon. 


As  wp  h- ve  seen  In  the  previous  sections,  the  formulation  of 
the  gold-mining  pr'iblem  In  its  discrete  form  leads  to  a  number  '^f 
unsolved  problems.  We  turn,  therel’ore,  to  a  continuous  version  of 
the  problem  In  the  hope  of  overcoming  i^ur  difficulties  by  use  if 
the  more  powerful  tools  of  continuity.  We  can  now  resolve  the 
corresponding  questions  In  complete  detail  and  thereby  obtain  a 
clear  Insight  Into  the  structure  of  t!ie  optimal  pollcleR.  he  solu 
tlons  determined  In  this  way  can  now  be  used  as  approximations  In 
the  original  discrete  process. 

One  very  Interesting  and  crucial  fact  emerges,  Wh^^reas  the 
original  discrete  problem  had  certain  linear  aspects,  at  least  in 
the  case  where  we  were  considering  expected  re ’•urn,  the  continuous 
version  Is  sufficiently  nonlinear  to  permit  a  variational  approach 
In  the  classical  manner.  In  carrying  through  this  varlat  lonal 
attack  our  knowledge  of  the  form  of  the  solution  In  the  discrete 
formulation  Is  of  great  service  In  telling  us  In  advance  what  to 
expect.  It  Is  a  combination  of  the  two  techniques,  old  and  new, 
which  permit  a  successful  attack  on  the  problem. 

Let  us  now  begin  by  discussing  some  methods  we  may  follow  * 
obtain  a  continuous  analogue  to  (l.l).  "he  basic  assumption  Is  thia 
each  operation  Is  to  have  a  high  probability  of  ob‘aln!ng  a  small 
amount  and  leaving  the  machine  undamaged^  and  a  small  probability  of 
obtaining  nothing  and  damaging  the  machine. 

Let,  for  6  >  0  and  small. 
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1  —  qi6  -  probablllvy  of  obtaining  rix6  an'^  leaving 
the  machlnp  undamaged.  If  A  la  used 

1  -  ijb  -  probability  of  obtaining  rpv’"  -..'iri  leaving 
the  macnlne  undamag^'d  1  f  la  osf^d, 


(IVI) 


where  qj  ,q2  >  • 


.Setting,  ar  before,  f(x,y)  (»qual  to  trie  total  expec*ed  gain 
obtained  befor*^  the  machine  1:;  damaged,  we  obtain  the  functional 
equa  1 1  on 


r(x.y) 


>'ax 


(l-q,6)(r,xb  +  f(x-r,xB,y) 
(I^p6)(rpy5  ^  f(x,y  -  rpyb) 


(1-.2) 


If  we  prac'=^ed  formally,  let^-lng  b  - wf^  ob'aln 


P  ( X  ,  y  ) 


I  A  : 

Mnx  1 


B: 


f(x,y)  4  r,xb  -q,6f(x,y)  -  r,x6  f(x,y)-t  h(5') 
r(x,y)  +  rpyB  -q?6f(x,y)  -  r^yb  f (jgy)  +  ) 


(13.3) 


or 


Max 


rgv 


J.  I  ^  ( X  ,  y )  -  r ,  X 


n?f(x,y)  -  rpy  fixor) 


(13.^; 


"nlr  doer  no*  r eem  to  be  a  frul'ful  approacn  because  ^f  the 
difficulty  of  r  •  a  •  :  Mb  1  ng  any  existence  or  unlqupnss  the  rems  ,  or, 
In  general,  of  treating  ’.ne  equation  In  ( ^  )  analytically. 


t 


In  place  of  using  a  differencial  approach,  we  may  use  an 

Integral  approacn  and  then  let  6  - >  0.  Let  us  use  ( 1 3 • ? )  and 

Iterate,  obtaining  the  correcpond  Ing  equa'l.in  f  r  n  a’eps  a’  a 
time.  Ihe  result  has  the  form 


i'(x,y) 


Max 

8n 


Rn(^*y) 


V 


Pnk(’''y>^<’'nk 


nk 


_) 


(n.  •  ) 


where 


R  (x,y)  -  expected  return  from  n  stages  using  -he  (1^.6) 

policy  r  » 
n 


P 


nk 


(x,y)"  probability  of  surviving  and  being  at 
using 


'^nk 


) 


S  -  policy  pursued;  l.e.,  the  choice  of  A  or  P  at 

^  each  stage. 


If  p6  Is  chosen  to  remain  finite  as  6  — >  0,  n  — >  oi ,  and  set 
equal  to  t,  the  analogue  of  (l3.L)  Is  a  functional  equation  of 
the  type 


f(x,y) 


Max 

r(*0 


R(jt.y.t)  +  J 

r-O 


1 

/  xr,ys)dG  .  (r,s ,x,y 

s-0 


(1^7) 


Functional  equations  of  this  class  will  occur  In  mo;”  c'>n-lnu- 
ous  versions  of  dynamic  programming  problems.  We  shall  not  enter 
into  any  discussion  of  this  formulation  here  because  of  ’.he  many 


r-^3^ 


conc'^ptufil  ani  mat^.ema '  Icn  _  ilfricultlps  wl'ri  ‘  ;te 

concept  of  a  contlnooun  s’ra*e/y,  par'lcnlnrly  when  the  0'-i‘c  me 
Ir,  rtochastlc.  'nn'eati  we  nhall  use  a  tr.lrd  approach  which  bears 
the  same  connection  to  ( 1  "5 .  7 )  as  the  use  of  ‘he  tieat  equation  In 
diffusion  theory  bv'ars  t^  ’he  Chapman-^'.  Imogoroff  equation.  At 
the  moment  It  Is  sufflclen’  f  r  aur  furpases. 

Let  us  begin  ty  no‘lng  tha*  according  to  ‘he  results  of 
the  solution  of  (l'.2)  is  determl'.eh  by  the  boundary  curve 


(  ’  6)  r,  x6 


( l~q;.b  )r?y6 


which  as  C  — >  h  approaches  ‘he  line 


f  :  r,  x/q,  -  r?y/q?  . 


(n-Q) 


If  (x,y)  Is  below  ,  use  A,  C'^n'lnulr.g  across  horizontally  until  ;■ 
Is  flit,  ar.d  tnen  con’lnulng  down 

A  strategy  of  this  type  Is  not  Included  In  the  irlglnal  formu¬ 
lation  of  ‘he  prot  lem  wfitcti  all  wed  only  norlZ  )n‘al  or  ver’lcal 
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motlon,  1  .  ,  uae  of  A  or  B.  It  Is  clp%r,  however,  th^t  ’his 

pollcj  can  be  arbitrarily  closely  approximated  by  use  of  A  and  B 
moves.  This  rujzigesis  that  the  continuous  version  of  the  original 
problem  may  not  possess  a  policy  yielding  a  maximum  re’urn,  but 
only  a  sequence  of  policies  yielding  a  supremum. 

However,  the  Introduction  of  mixing  at  a  point  lntr)duc^>s  a 
number  of  difficulties  of  both  ph.yslcai  and  mathematical  kind. 
Mathematically,  we  find  ourselves  c^nfron’ed  by  *  he  same  difficul¬ 
ties  that  made  us  wish  to  bypass  (ij*.?);  physically,  we  are  reluc¬ 
tant  to  accept  this  type  of  policy  as  one  applicable  ’o  ’he  original 

problem  which  Insisted  upon  a  choice  of  A  or  P. 

To  avoid  this  concept  of  mixing  at  a  point  ,  we  use  a  frequently 

useful  device.  The  essence  of  It  Is  that  for  mathematical  pur¬ 
poses  mixing  over  small  Intervals  Ip  equivalent  to  mixing  at  t 
Itself,  under  certain  natural  continuity  assum.pt  ions ;  cf.  for 

a  further  discussion. 

We  shall  assume  then  that  we  are  considering  a  process  which 
requires  a  choice  of  A  or  B  at  time  points  Q  ,A  ,2  Ci,  ■  ’  •  ,etc .  ,  and 
that  over  a  typical  Interval  [k^,  (k+l)^^  we  use  A  for  the  part 
[kA,kAt^i4]  and  B  for  [Tca»-^jA,  ,'k+l)i^,  where  <^>1  depends  upon  k. 

Assuming  that  A  Is  small  and  that  flrst—^'rder  terms  are  suf¬ 
ficient  to  describe  the  process,  we  shall  derive  a  set  of  differ¬ 
ential  equations  which  determine  the  pr)ceBB. 

Having  set  up  the  equation,  we  shall,  to  Illustrate  the  power 
of  he  method,  solve  In  turn  problems  corresponding  to  the  two— choice, 
three— choice ,  two-choice  finite  time,  and  general  nonlinear  utility 


func  t ion. 


Derivation  of  the  Differential  Equation. 

We  aspume  as  above  that  the  total  time  Interval  Is  divided  into 
small  intervals  of  length  A.  In  a  typical  interval  [k ( k+ 1 
the  first  part  of  the  Interval  [k  a.KAfi), A]  Is  devoted  to  the  use  of  A 

If  X  Is  the  amount  of  gold  in  mine  A  at  the  time  kii,  there  is 
a  probability  1  -  qiijA  that  an  amount  TiX^  Is  mined  and  the 
operation  may  be  continued;  and  a  probability  that  nothing 

is  obtained  and  the  operation  stops.  The  second  part  of  the  inter¬ 
val  Q(A-f  ^  i4,  (kfl  Is  devoted  to  the  use  of  B.  If  mine  B  contains 

an  amount  y,  then  there  is  a  probability  1  —  that  the  amount 

ray^jA  is  obtained  and  the  operation  may  be  continued;  and  a  pro¬ 
bability  that  the  operation  ceases,  where  ii?  •  1  -  . 

As  far  as  first-order  terms  in  .A  ®re  concerned,  it  makes  no 
difference  in  what  order  the  operations  are  performed.  It  is  this 
feature  which  allows  this  type  of  mixing  to  perform  the  function  of 
mixing  at  a  point. 

A  strategy  consists  of  a  choice*  of  i)  i  and  ^>2  for  each  inter¬ 
val.  For  any  given  strategy,  let 

x(t)  -  amount  of  gold  remaining  in  A  provided  the  operation 
has  continued  to  t, 

y(t)  -  amount  of  gold  remaining  In  B  provided  tne  operation 
has  continued  to  t, 

p(t)  ■  probability  that  the  machine  survives  until  t,  » 

that  the  operation  continues  until  t. 

f(t)  -  expect'^d  amount  of  gold  mined  up  to  time  t,  (l4,l) 
where  t  -  nA,  n-0,l,2,-*‘. 
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Ignorlng  the  second— on^er  terms  in  A ,  we  have 
x(t+A)  -  x(t)  -  riii(t)x(t)A 

y(t-fA)  -  y(t)  -  r2i)2  ( t)y(t  )A  (l4.2) 

p(t^d}  -  p(  t)  ( 1  -  q,^,  (t)A  -  q2^2  ( t  )A) 

f(t+6)  .  f(t)  +  p(t)  [(j),  (t)r,x(t)  +  ^2  (t)r?y  (t)]  A  . 

Letting  A - >  we  obtain  the  system  of  differential  equations 

dx/dt  -  -  ^i(t)rix(t),  x(0'  -  x^, 

dy/dt  -  -  ^2 (t)r2y(t} ,  y(0)  -  y^,  (14. 3) 

dp/dt  --p(t)  [jl)i(t)qi>«-  ^2(t)qJ,  p(0)  -  1, 

df/dt  -  p(t)  [^i(t)rix(t)  -f  ^2  (t)r2y  (t)]  ,  f(0)  -  0. 

The  problem  Is  now  to  determine  i)i  ,  wi^^ere 

0  1  ^(t)  11,  ^2(t)  -  1  -  ii(t),  (14.4) 

so  as  to  maximize  f(T).  A  case  of  particular  Imponance  is  t  -  oo. 

We  shall  derive  similar  equations  for  ihe  thr^e-cholce  problem. 

513-  The  Variational  Procedure. 

Let  (^>1  and  02  be  functions  furnishing  the  maximum,,  a-.d  let 

*1  -*■!+£:  j{'.)  (15- 1  ) 

Where  £  Is  a  small  positive  quantity,  Pi,r2  are  two  functlms 
satlsfylnp;  the  conditions 

0  i  £;  i  <  1  , 


(15.2) 


P-4^^ 


(wnlch  |;  J  <  1 /£ )  ,  ar,i  +  r  tha-  l.-,e  a  also 

a  i  m  1  a  a  1  b  1  i  '  ?  . 

p  follows  that  ^ ( 0  <  1  ( O  -  1 .  ;  1  ( ■  )  z  "  1  i'i  ( 0  - 

ani  can  be  of  <“lthpr  sl/,n  of  0  <  )  <  1,  •.  h^  region  whe 

frpp  variation  1?  fermlt^#*'i.  rprl‘''rTalnp  ‘ne  variation,  w#*  fln^’ 
reailly  that 


x(t) 

y(t) 

p(t) 

T^(t) 


x:t)(l-£r,B, (t))  4  (£) 


(1-M 


y  '  t  )  (  1  CrpP,  ;  •  )  )  ^  o(J  )  , 


^  ( '  ^  •  1  £  1 1 11 1  (*)  —  £;?  3?  V ')  )  +  o  (^  £  ) 

l'(~)  -  r'(t)(li3.(-)>l23;,(t;)  +  r,3,  (•  )f(t)x'  : 

+  r?  3?  '  •  )  r  (  f  )  y  '  (  O  +  r ,  ■  ,  t  •  )  f  (  O  X  ^  • ) 

r?  ■?(•);(•  )y(  •  )\  ^t  a-  o(£) 


whf*r^  wa  nav^  ri“t 


oc^'  - ; 


.  (  s  )  i  , 


(1-  -M 


o 


ar.i  ‘  hp  bars  r^^fpr  to  ’  hp  for*  orboi  ■.■arl-Mos. 

^n’^yra’lr.y  ty  parts  to  o  i  inir-.a  t,p  *  no  3^(0  »  fln^ 


r(T)  -  r(:  ) 


I  rK,(t)  ,(■-:  *  *  Ol.), 


(l''o) 


who  ro 


K  /  t ) 


Kr(t) 


--  /  r'(s)ds  +  rif(':)x':)  -r,  f  p'(s)x(a)d9 


(13.6) 


-  12 


/'  *''(9)-!:  +  r2p{':)v(T)  -  r?  J  P '  (a  )y  (slds . 


since  r(T)  -  f(T)  ^  0,  we  see  that  whenever  Kj^(t)  >  Kj(t)  we 
rust  have  <t)j^(t)  -  1,  "  0-  These  relations  yield  Implicit 

equations  for  and  ()j.  In  the  next  section  we  shall  discuss  the 
behavior  of  the  K— functions  In  more  detail. 

§16*  The  i-j’havlor  of  K^. 

The  fundamental  relation  Is 


— (Ki— K2)  ■  (qi— <lx)T'(t)  —  p  *  ( t )  ( r2y— r  I X ) 

“  P  II  qiTay  -  qarixl]  . 


(16.1) 


Thus  a  "mixed  policy,"  one  for  which  more  than  one  of  the  Is 
positive  for  a  given  t,  which  Implies  Kt(t)  -  Kg(t),  can  be  opti¬ 
mal  only  on  the  line  qir2y  ■  q2riX.  Thin  line  Is  precisely  the 
boundary  line  that  one  obtains  by  passage  to  the  limit  from  the 
solution  In  the  discrete  case  as  A  — >  0,  as  In  (12.9)> 

If  a  mixed  policy  Is  pursued  along  the  line,  and  nnist' 
be  chosen  to  stay  on  this  line,  which  means  that  the  slope  s  ■  y/x 
must  be  kept  constant.  Since 


•  (y/x)  ■  8(t)  ■  (16.2) 


we  see  that  we  must  have 


^2 


ri 

ri+r2 


(16.3) 
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^17.  The  Solution  for  T  -  00. 

With  these  preliminaries  out  of  the  way,  let  us  determine 
the  optimal  policy  for  the  Infinite  process,  T  00 .  The  Infinite 

f 

problem  Is,  as  usual,  simpler  than  the  finite  case  because  of 
the  homogeneity  Introduced  by  Infinite  time;  after  any  Initial 
actions,  we  are  confronted  by  a  problem  of  the  same  type,  with 
'Ifferent  initial  '/alues.  Let  us  note  that  a  consequence  of  this 
u  the  homogeneity  of  the  equations  with  respect  to  x  and  y  is 
that  the  decision  at-  nny  point  Is  a  function  only  of  the  slope 
a  -  y/x. 

Let  us  begin  by  observing  that  above  the  l^ne  Qirgy  -  qa^iX 
In  the  (x,y;  ne  If  policy  A  Is  ever  used,  It  Is  used  thereafter. 
This  follows  Immediately  from  (l)  of  §16  which  shows  that  Is 

In'  easing  when  qir2y  -  q^rix  >  0.  Since  use  of  A  decreases  x 
and  leaves  y  unchanged,  once  Kt  >  K2  the  use  of  A  maintains  the 
Inequality. 

Near  the  y^xls,  however,  the  use  of  A  continually  Is  not  as 
rewarding  as  continual  use  of  B.  For  If  ^>1  •=  1»  ^2  •  0,  for  t  >  0, 
we  have 


x(t) 

y(t) 

p(t) 

r(t) 


-r,  t 


o 


Qi  t 


ris  -qi8  . 
TiX  e  *  ds 

o 


(17.1) 


f^(oo )  -  r,x^/(qi+ri ) 
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However,  ■  0,  ^2  -  1  for  all  t  yields  similarly  rg(oo) 

■  r2y^/(q2  +  r2 )  .  For  Yq/^q  sufficiently  lartz:e  fg(oo)  >  f^(oo). 
Thus,  there  Is  a  region  near  the  y-^xls  where  D  Is  used. 

This  region  where  B  Is  used  extends  down  to  the  line  r;,r2y 
«  q2riX.  To  prove  this  we  observe  that  a  mixed  policy  cannot  be 
pursued  above  the  line  and  If  A  Is  ever  used  above  the  line  It  Is 
always  used  thereafter’.  Using  A  Indefinitely,  howe/er,  would 
eventually  take  (x,y)  Into  the  region  near  he  y--axls  where  B  Is 
known  to  be  optimal,  a  contradiction.  Hence  B  Is  always  used 
above  the  line.  Similarly,  bel"  .he  llnr  A  Is  always  used. 

When  the  line  qir2y  ■  qariX  Is  reached,  the  point  (x,y)  must 
remain  on  the  line  thereafter.  For  If  n)t,  then  an  A  mus’ 

be  used  In  a  B  region  or  vice  versa,  which  Is  Impossible.  .Hence, 
on  the  line  Itself  the  mixed  policy  of  (i4.3)  must  be  employed. 

We  have  thus  demonstrated 

Theorem  6.  With  reference  to  the  equations  ^14.'^)  and  the  con- 
s tra In ts  (l4.4),  the  maximum  value  of  f(o  )  Is  attained  by  us  of 
the  policy 

-  1  £or  qir2y  <  q2riX, 

^2  -  1  for  qir2y  >  q2r,x,  (l7.2) 

b  ■  ’  **  ■  -FTTF? 


Note  that  and  ^2  are  determined  almost  pv^rywhere  by  the 
above  ar’guments,  and  hence  are  essentially  unique. 
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51b. 


olullon  for  Finite  'lo’al 


In  rindln^r  ihe  polullon  f  :)r  flr^lre  ,  wn  shall  begin  by 
dpt.prmlnlng  wha  i  policy  Ir  ur^d  las*  .  Tlnce  an  ^p'lmai  policy 
h'l*  ‘■ns  jr')p^r’y  'nat  llr  c  "jm  Inua*  Ion  af'pr  any  InlLlal  pan  is 
als  i  )p*.Imal,  wp  shall  conslipr  Ihp  c  ire  wh^pp  1  Is  small.  We 
ha  VP,  for  d'  close  i  *»  0, 


f(T)  =  /' 


r  (  ■"  )  L  i  1  (  s  )  r  1  X  (  s  j  +  (()?  ( s  )  r?  y  ( .0  1  s 


o 

-  r,  X  ^  J  VI  ( 

o 


^  i)p  (s  )ds  a-  o(':  ) 
o 


( ’.  t .  1 ) 


I  follow:’  thpn  t.ha*  for  small  f  Ihe  maximum  Is  obtained  by  ‘^aklng 
-  1,  Tor  r,x  >  r?y  and  ^1(3)  -  0,  ^2(3)  -  1  for 

r2y>rix.  ^r  Is  ^obeexppcted,  for  small  durations  expec’ed  pain, 
without  worry  ahou*  ♦  p rm 1 na 1 1  on ,  Is  the  determining  factor. 

If  Cl  -  02  the  lines  r2y  -  r,x  and  qir2y  -  q2hiX  coincide, 
and  * hp  optimal  policy  Is  paslly  found  to  be  the  same  as  that 
f or  T  -  CO . 

Let  'js  consider  the  ppneral  case  where  qi  f  qp .  Assume,  with¬ 
out  loss  of  penerall’y,  ^ha*  ’’he  line  r2y  "  riX  lies  above  *  he 
llr.p  Tirpv  -  qpTiX.  "'he  positive  quadran*  Is  ’.hen  divided  In^'o 
three  replort'’,  which  we  label  I,  II,  III. 


Flp.  7 


P-^33 

-48- 


As  before.  It  follows  that  In  region  I  a  b— policy  once  used 

must  be  continued  thereaftei-,  while  In  regions  II  and  III  the  same 

holds  for  an  A— policy.  Also,  In  regions  I  and  II  an  A— policy  Is 

used  If  the  time  resulting  Is  sufficiently  small,  and  In  III  a 

B— policy  under  the  same  conditions.  From  this  we  c-'nclude  that 

an  A— policy  Is  always  used  In  I,  and  a  R-poll  y  always  while  In  III. 

Let  us  now  establish  that  an  optimal  policy  n^'ver  switches 

from  A  to  B.  let  us  suppose  otherwise  and  le*'  be  tne  time  a* 

which  the  change  occurs.  Flnce  at  t^,  A  Is  t^^rmlnat^d,  the  poln' 

(^(  ^o^  ^  must  be  In  region  I,  or  on  the  boundary  be'wes't  1 

and  II.  Using  B  will  keep  the  point  (x(t),y(t))  In  1  for  all 

t  ^  since  we  know  that  B  once  used  In  I  must  be  continued.  H  )W- 
0 

ever,  this  contradicts  the  fact  that  A  Is  used  In  I  whenever  ‘he 
time  remaining  Is  sufficiently  small.  flmllarly,  the  combination 
of  using  the  mixed  policy  and  then  B  canno’’  occur,  since  the  change¬ 
over  must  occur  on  the  boundary  between  I  and  II,  and  B  usei  'here¬ 
after  In  region  I,  a  contradiction. 

This  reducer  the  number  of  types  of  solutions  'o  six:  A  always, 
B  always;  the  mixed  policy  foll-^wed  by  A  ;  A  then  thp  mixed  policy 
and  finally  A;  B  then  the  mixed  policy  and  then  A;  B  follow^^d  by  A. 

Let  t^  be  the  value  of  t  at  which  ' hp  last  cnange  of  policy  Is 
made  In  an  optimal  strategy.  If  such  a  change  occurs.  For  <  t 
^  T,  we  must  have  i)i(t)  -  1,  <i>2  ( t )  -  0.  We  now  c  mpu'e  the  value 
of  -  K2(t^).  We  have  for  tQ  <  ‘^  <  T, 


x(t) 
P(t) 
f  (t) 


-  x  ( t 

-  p(t 

-  P( 


-ri(t-t  ) 

»  y('0  -  y(^o) 


o ' 


-(qi+Ti ) (t-t J 


(18.2) 
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and  ,  aft<“r  somf*  y  Impl  1  fl  ca  t  Ion  , 


Ki (t^)  -  K2(i^)  -  p(t 


o 


)r,x'’  )  ["(1  - 


^2 


-(qi+Ti  ) 


r.y(^o) 


(1^. 


r,  X 


(tjJ 


o 


For  any  flxpi  point  ( x  (  t  ^  ,  y  ( t  ^  ) )  in  rlyh'  aldo  la 

positive  Tor  '■ -l-p  small,  and  ne^a'lve  for  "—1^  large.  i*  la  equal 

to  zero  for  preclseiv  one  value  if  .  Tnla  zerD  determines  when 

o 

the  changeover  occurs.  When  1’  occurs,  r  Is  used  for  the  remaining 
time,  with  any  of  tne  six  beginnings  above,  depending  upon  the 
location  of  the  Initial  point. 


_gil9*  The  "I  hr^e— ctiol ce  Problem. 

The  continuous  version  of  '.he  'hree— choice  problem  men'  loned 
shove  Is  the  followlrtg:  Given 


dx 

Tt 


[7i'i(Mri  a-  f^(t)rT^x't), 


x(o) 


X 


o 


^y 

HT 


do 


-  (’ )r?  +  •  )r4  3]y(t)  ,  y(0) 

-  r')  +  ^2  {^ )  Is  i>s(*)iT3]»  r(o) 


o 


(19. ij 


df 

^t“ 


p(0  [I(i»(Nri  i>s(’  )r,)x(t  )  a-  {^2(t)r^  +  is  ( •  )  ^4  )  y  (  t  )  ]] 


r(o) 


wne re  for  all  ’ 


1 


f 


<  ii  <  ^ 


^ ,  +  ^2  i  -I 


(19.2) 
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It  1b  required  to  determine  the  ^^(t)  so  as  to  maximize  f(T). 

We  shall  consider  only  the  case  where  T  -  oo .  ^ 

As  before,  let  us  set  B^(t)  -  J"  P^(8)ds. 

o 

We  obtain 


(3) 


x(  t ) 
y(t) 


X  (  t  )  (  1 

y(t)(i 


p(t)  -  p(t)(i  - 


-£riBi{t)  -tr3B3(t))  o{t) 

er2B2(t)  -tr433(t))  +  o(t) 

3 

^  ^  ^  qj^Bj^(t) )  +  o(^  ) 


dT 


“  -  PC(^iri  +  ^3r3)x  +  (4>2r2  4>3r4)yll 


Consequently,  following  the  same  technique  as  before,  we  oblaln 


(M 


A 

r(T)  -  f(T)  -  £J  CKiP,  +  KsPs  +  KsBaUdt  +  o(£) 


where 


(5) 


T  T 

Ki(t)  -  -  qi  J'  f'(8)d8  +  rip(T)x(T)  -  r,  J  p'(8)x(8)di 
t  t 


Ka  ( t )  »  —  q2 


?  r 

J  f'(8)d8  +  r2p(T)y(T)  -  r2  i  p'(8)y(8)di 


K3(t)  -  -  qs  J"  f»(8)d8  +  p(T)Qr3x(T)  +  r4y(T)  ^ 
t 


-  r  P'(s)  Qr3x(8)  +  r4y(8)]ds 
t 


-51- 


§20.  Some  Lemmas  and  Preliminary  Results . 

The  statements  In  the  lemmas  below  concerning  the  dependence 
of  the  upon  the  are,  of  course,  taken  to  hold  almost  everywhere. 

Lemma  1.  ^Kj^(t)  >  Kj(t),  then  <i)j(t)  1  0£^j(t)  -0. 

Proof :  Let  E  be  the  set  of  t  for  which  the  assertion  does  not 

hold.  Let  -  1,  fj  “  for  t  In  K,  and  let  the  3'b  be  zero 
otherwise.  The  variation  Is  admissible  for  S,  sufficiently  small 
and  makes  7(t)  -  f(T)  positive  If  m(E)  >  0. 

Lemma  2.  Kj^(t)  >  Kj(t)  for  J  f  1,  then  ■  1. 

The  proof  follows  Immediately  from  the  above. 

Lemma  3.  If  there  Is  a  J  such  that  K^(t)  <  Kj(t),  then  •  0* 

Again  a  simple  consequence  of  Lemma  1. 

Let  us  now  compute  the  derivatives  of  the  .  A  straight¬ 
forward  calculation  yields  the  symmetric  results 

-i(t)  -  p  (_  C 1  ^2  +  C2^)3  ^ 

^  r-  L  L  m  (20.1) 

K2(t)  •  p  [^,*1,  -  c,*),n 

K3(t)  ■  p  [j-C2$'i  +  C3O2  ^  , 

where  we  have  set 


Cl  *  qir2y  -  q2riX 

O2  =  qir4y  -  (qari-qiTa)* 

C3  -  (qsr^  -  q2r4)y  -  q2r3X 


(20.2) 
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The  relative  poaltlona  of  the  three  lines  »  0  are  deter¬ 
mined  by  the  quantity 

D  -  qir2r3  +  q2rir4  -  q3rir2  (20.3) 

If  we  assume  that  all  three  lines  lie  In  the  positive  quadrant,  a 

straightforward  calculation  shows  that  If  D  >  0  the  lines  have 
the  position  shovm  In  Fig.  8,  while  If  D  <  0,  they  lie  as  shown 

In  Pig.  9c 

C2*0 

I  / 

|C2>0  /  / 

[C3>0/  y  C3"0 

c,<0 
02<0 
C3<0 

It  Is  possible  for  both  cases  D  >  0,  D  <  0  to  occur.  The 
case  where  one  of  the  lines  Ca  -  0,  C3  «=  0  lies  outside  the  posi¬ 
tive  quadrant  yields  an  Immediate  simplification  of  the  following 
argunents  without  changing  the  over-all  structure.  Consequently, 
we  shall  discuss  In  detail  only  the  above  cases. 

i 

^<a  ...  X 

_§2l.  Mixed  Policies. 

As  above,  we  denote  by  the  termi  "mixed  policy"  a  situation 
In  which  the  have  values  different  from  0  and  1.  By  an  A— policy 
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1 


we  shall  mean  a  B— policy  ^2-1,  and  a  C-pollcy  i>3-l  .  Let 

us  prove 

Lemma  4.  No  optimal  policy  contains  a  mixture  of  A,  B,  and  C 
polic les . 

Proof:  Let  us  assume  that  In  some  Interval  we  have  a  Imu  1  tar^eously 

»  i>2 ,  ^3  >  0-  In  this  Interval  we  must  have  Ki  »  K2  ■  K3. 

This  yields 

Ki  —  K2  ■  p  C 2  <p2  (C2  +  C3)^'3_J  “0 

Ki  -  K3  ■  p  |_C24>1  {Cl-C3)<p2  +  C2  4>3  _1  “  0  . 

The  solutlor.  for  »  ^2  >  ^3  If  Ci  —  Cg  —  C3  f  0, 


4)2 


2  “ 


~^2 

C 1  — C2— C3 


^3  - 


Cl 

1  -^2—^3 


{2\.2) 


Slncp  tne  4,  must  'e  positive  lii  this  Int'^mval,  we  must  have 
Cl,  — C2 ,  -C3  all  of  the  same  sign.  It  Is  easily  verified  upon 

referring  to  Figs.  8  and  9  that  Ir.  both  cases  D  >  0,  D  <  0,  this 
can  n'^ver  o 'Cur. 

Purthemorf^,  Ci  -  ^2  -  C3  »  only  If  the  lines  Ci  -  0, 

C2  -  0,  €3  “  culnclle.  ViTien  this  occurs  thp  protl'^m  is  equiva¬ 
lent  to  the  two— choice  pro'clem. 

Let  us  now  Investigate  the  possibility  of  using  mlxeu  policies 
Involving  only  two  of  the  three  pollclf^s,  A,  B,  or  C. 


Lemina  5*  Concerning  the  mixing  of  two  and  only  two  pollclPB,  we 
have  the  following  results: 


(a)  A  mixture  of  A  and  B  la  permissible  only  along 
Cl  “  0,  where  -  r2/(  Tj -»-r2  )  ,  p2  =  ri/(ri+r2y. 

(b)  A  mixture  of  A  and  C  is  permissible  only  along 
C2  »  0,  where 


Px 


ri+r4— r 


0 

3 


ri-^-r^-rs 


(21.0 


(c)  A  mixture  of  B  and  C  is  permissible  only  along  C3 
C3  -  0,  where 


^2 


,  <1,3  _ _ ^  . 

r2+r3-r4  r2-*-r3-r4 


0, 


Pi  ^of :  If  ^1  ,  <i>2  >  0,  4>3  ■  0,  we  must  have  Ki  *  K2  >  K3.  In  an 
interval  where  this  occurs, 


0"Ki  —  Ka  -p  [^Ci  (^1+^2  )  J  •  (?I  •  ^  ) 

Hence  Ci  -  0.  The  values  of  ^i  and  ^2  which  keep  (x,y)  on  this 
line  are  determined  as  in  the  two-choice  case.  The  other  asser¬ 
tions  in  Lemma  5  are  obtained  similarly. 

^2.  The  Solution  for  Infinite  Time,  D  >  0. 

Having  obtained  these  auxiliary  results,  w©  now  proceed 
to  find  the  solution  to  the  problem  of  maximizing  f(oo).  We  shall 
assume  that  r3  >  r4 ,  since  the  case  r4  >  can  be  handled  by 
Interchanging  the  roles  of  x  and  y  and  A  and  B.  The  deger.erate 
case,  r3  =  r4 ,  will  be  discussed  separately. 
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Let  U3  make  nn  initial  observation  that  when  Tj  >  r4  the  mixed 
AC  policy  Is  never  used,  for  by  (21.3)  and  ^3  cannot  both  be 
positive.  The  solution  takes  two  distinct  forms  depending  upon 
whether  D  >  0  or  D  <  0.  Let  us  begin  by  conslde^^lng  D  >  0.  We 
shall  establish  the  principal  results  In  a  series  of  lemmas. 

Lemma  6.  In  an  optimal  policy,  B  la  used  near  the  y-axls. 

Proof;  There  Is  a  region  near  the  y-axln  where  A  Is  not  used. 

For  If  Cl  >  0,  C»  >  0  and  A  la  used,  l.e.,  i>i(t)  -  1,  we  have 
kJ  ■  0,  Ks  <  0,  K3  <  0.  This  Beans  that  Ki  remains  the  largest 
for  ti  ^  t.  Hence  If  A  is  used  In  this  region.  It  must  be  pur¬ 
sued  thereafter.  Let  us  now  compute  the  results  of  a  continued 
A-pollcy,  a  continued  B-pollcy,  and  a  continued  C-pollcy.  We  have 


^*3(00) 


riXQ/(qi-fri ) 

rey^/Cqa+ra) 

Qa+ra  q3+r4 


(22.1) 


A  comparison  of  f^(oo)  and  fg(oo)  shows  that  fg(oo)  >  f^(oo) 
for  y/x  sufficiently  large. 

Let  us  now  show  that  In  the  region  above  C3  *0,  If  C  Is  used 
It  Is  used  continually  thereafter.  Using  C  Increases  the  slope 
*(t)  •  y(t)/x(t),  for  with  ^3  ■  1  we  have 


s'(t)  -  s(t)(r3-r4)  >  0  . 


(22.2) 


On  the  other  hand,  using  B  decreases  the  slope.  Hence,  we  cannot  use  B 


after  C,  for  to  do  so  would  return  us  to  a  region  where  C  was 
to  be  used.  We  have  already  shown  that  A  cannot  be  used  after  C 
when  close  to  the  y-axl.s.  A  comparison  of  fg(oo)  and  fj^(oo)  shows 

that  it  la  better  to  use  B  rather  than  C  near  the  y-ax.ls  If 

r^yAoa+ra)  >  r4y/(q3-»-r4 ) ,  or  q3r2-q2r4  >  0.  This,  however.  Is 
precisely  equivalent  to  the  condition  that  Cs  -  0  lie  within  the 
positive  quadrant,  which  we  have  assumed. 

It  follows  then  that  neither  A  nor  C  Is  used  In  a  region 

near  the  y-axls,  and  we  know  that  no  mixed  poMcy  Is  pursued  there. 

Consequently,  B  must  be  used  In  a  region  adjoining  the  y-axls. 

Lemma  7,  The  lower  boundary  of  the  B— region  adjoining  the  y-axla 
Is  the  line  -  0.  On  that  line  a  mixed  BC— policy  Is  employed. 

Below  C3  -  0,  B  Is  never  used. 

Proof;  Let  us  begin  with  Initial  values  (x^,yQ)  near  the  y-axls 
In  the  region  where  B  Is  used  and  consider  what  form  an  optimal 
strategy  can  have.  B  cannot  be  used  Indefinitely  since  this  would 
eventually  take  (x,y)  near  the  x-axls  where  comparison  of  V'o) 
and  fg(oo)  shows  that  A  Is  superior.  However,  since  both  A  and  C 
Increase  th^  ’  ir  y/x,  B  cannot  be  followed  by  A  or  C  since  both 
of  these  put  the  point  (x,y)  back  Into  a  region  where  B  Is  to  be  used. 
It  follows  that  B  must  be  followed  by  one  of  the  mixed  policies 
AB  or  BC. 

Let  us  show,  however,  that  for  D  >  0,  the  mlxiure  AB  never 
occurs  In  an  optimal  strategy.  For  If  AB  Is  used  we  have,  by 

(20.1), 


P-435 

-57- 


Ki  -  P  -  C2<>,  ^  C  0.  (22.3) 

Since  K| (co)  ■  K2{oo)  -  K3(oo)  ■  0  and  K*  -  Kg  -  0  while  AB  la 
being  used,  It  follows  that  K3  >  Kj  -  K?  while  the  AB-mlxiure  Is 
being  used.  This,  however.  Implies  that  ^3  -  1,  ■  ^2  -  0,  which 

Is  a  contradiction. 

The  remaining  possibility  then  is  that  BC  Is  used  after  B  on 
the  line  Cs  ■  0.  B  cannot  be  used  below  this  line  as  a  consequence 
of  the  above  arguments. 

Lemma  8.  C  Is  used  In  the  region  between  the  line  Cs  -  0  and  a 
line  L  -  0  which  Is  below  C2  -  0. 

Proof;  A  Is  not  used  In  a  region  near  the  line  Cs  -  0  because 
when  the  BC-mlxed  policy  Is  used  we  have  K!(t)  -  p  |]Cii>2  +  >  0 

and  also  K2(t)  -  K3(t)  >  K|(t).  Hence,  Immediately  before  BC  Is 
used  K|  <  K2  and  Ki  <  K3 ;  therefore  A  Is  not  used.  Consequently, 

C  must  be  used  Immediately  below  C3  -  0. 

The  C  region  actually  extends  below  the  line  C2  ■  0.  While 
C  is  followed,  Ki'(t)  «  PC2 »  which  Is  positive  when  Cx(t),y(t))  Is 
above  C2  ■  0.  Hence,  Ki  <  Ks  when  |x,y)  Is  In  that  region,  and 
C  Is  employed.  Also  Immediately  below  C2  ■  0  we  still  must  have 
Ki  <  K3  so  that  C  Is  still  used  there;  but  now  Kj  decreases  as  t 
Increases . 

There  are  two  conceivable  possibilities.  Either  C  Is  used  In 
the  whole  region  between  C3  -  0  and  the  x-axls,  or  the  line 
L  -  0  (which  Is  the  lower  bound  of  the  C  region)  Is  between  C2  ■  0 
and  the  x-^xls.  In  the  second  case  the  position  of  the  line  L  -  0 
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Is  determined  by  where  Ki  ■  K3.  The  following  lemmas  show  that 
In  fact  only  this  second  possibility  can  occur. 

Lemma  9:  A  Is  used  In  the  region  between  L  ■  0  and  the  x— axis. 

Proof:  The  statement  Is  vacuous  unless  L  -  0  Is  above  vhe  x-axls. 
If  It  la  above,  let  be  the  time  of  changeover  from  A  to  C,  so 
that  Ki(t^)  -  K2(t^).  3ut  when  A  Is  employed,  Kl(t)  -  0, 

Kg ( t )  -  -pCi  >  0,  K3(t)  -  -pCg  >  0.  It  follows  that  Ki(t)  >  Kg(t) 

and  Ki(t)  >  K3(t)  for  all  t  <  t^,  so  that  no  other  policy  can  be 

used  before  A. 

Lemma  10:  The  region  where  A  Is  used  Is  nonvacuous;  that  Is,  the 
line  L  ■  0  Is  above  the  x-axls . 

Proof;  We  proceed  by  contradiction.  Suppose  that  the  assertion 
were  false  and  L  coincided  with  the  x— axis.  Let  chosen 

below  C3  -  0.  If  C  Is  used  until  the  mixture  BC  Is  used  along 

C3  -  0  we  must  have  K3(t)  ■  0  for  ail  t  >  0.  Since  K3(oo)  -  0,  we 

have  K3(0)  -  0.  Since  C  Is  preferable  at  i^Q»yQ)  must  h'^ve 
0  -  K3(0)  2.  Ki(0).  Hence  since  Kj (00)  ■  0,  we  havp 

t '  00 

K,(oo)  -  K,(0)  -  ^  p(t)C2d^  +  ^  p(t)  [C,i)2+C2^£]dt  >  0 

(22. M 

where  t*  Is  the  time  of  changeover  from  C  to  BC.  Keeping  x^  fixed, 

let  y^  - >  0.  This  entails  t*  - >  00.  Since  C i  i)2  Is 

uniformly  bounded,  the  second  Integral  tends  to  7,ero.  We  have 
then,  using  the  expressions  for  x,  y,  p, obtained  from  a  C— policy 
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— Qs  ^ 


Yo - >0  o 


-r4t 


Qir4yQ  e 


(q3r,-q,r3)xQ  e 


— r^t 


-(q3+r3)t 

-  J  (q3ri-<5ir3)xQ  e  dt 

0 


dt  >  0, 


(22.5) 


(qsri-qiT: 

qa+Ts 


^  0- 


(22.6) 


which  contradicts  the  assumption  that  the  line  Cg  -  0  passes 
through  thp  positive  quadrant. 

This  completes  the  consideration  of  the  case  D  >  0  when  both 
C2  -  0  and  Cg  ■  0  are  contained  In  the  positive  quadrant.  The  com¬ 
plete  result  Is 

Theorem  7.  I£  D  -  qirgrg  +  q2rir4  -  qarirg  >  0,  the  solution  to 
the  problem  of  maximizing  f(oo)  subject  to  (l9»l)  Is  given  by 


Fig.  ID. 


C,-0 


C?-0 


Fig.  10 


It  does  not  seem  possible  to  specify  L  In  any  simple  way. 


-60- 


Flnally,  let  us  discuss  the  degenerate  cases  In  which  Ca  ■  0 
or  Cs  -  0  do  not  lie  In  the  positive  quadrant.  If  Ca  «  0  lies 
outside,  the  C— region  extends  all  the  way  to  the  y-^xls.  If 
C2  ■  0  lies  outside,  the  C-reglon  extends  all  the  way  to  the  x-axls. 

§2J>.  D  <  0. 

Let  us  now  consider  tlie  case  In  which  D  <  0.  In  this  case  It 
turns  out  that  C  Is  never  used^whlch  means  that  ’  he  solution  Is  as 
given  In  the  two-choice  problem. 

Lemma  11.  B  Is  used  near  the  y-axls . 

Proof ;  Precisely  as  before. 

Lemma  12.  'I'he  lower  boundary  of  the  B-region  adjoining  the  y-axis 
^  Cl  -0.  On  that  line  AB  Is  used.  E3eIow  the  line  B  is  not  used. 
Proof ;  As  In  the  case  D  >  0  we  conclude  that  a  B— policy  must  be 
followed  by  one  of  the  mixed  policies  AB  or  DC.  However,  In  the 
present  case  where  D  <  0,  tl^  mixed  policy  BC  cannot  be  used  In  an 
optimal  strategy.  For  when  BC  is  used,  we  have 

Ki'Ct)  -  p  [C,^2  C2^3  <  0,  (23.1) 

because  C3  -  0  Is  below  C2  -  0  and  Ci  -  0.  Also  Ki (00)  -  K2(oo) 

-  K3{oo)  -  0,  and  K2(t)  -  KsCt)  -  0  when  the  mixed  policy  BC  Is 
used.  Hence  Ki(t)  >  K2(t)  -  ICsCt)  when  the  BC-mln  Is  used.  This, 
however,  Is  a  contradiction  since  It  Implies  that  ^  1  ■  1  ,  |'2  ■  ^3  ■  0 . 
Hence,  a  B-pollcy  must  be  followed  by  use  of  AB  on  Ci  ■  0. 

Again  the  same  argument  as  above  shows  that  B  Is  not  used 


below  Cl 


0. 
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Lemma  12.  A  Is  used  In  the  entire  region  between  Ci  -  0  and  the 
x-axls . 

Proof ;  First,  C  Is  not  used  Just  before  the  AB-mixture.  While  AB 
Is  employed,  K!(t)  -  Ki(t)  -  0,  ^nd  K3(t)  -  >  0,  as 

can  be  seen  from  Fig.  7.  it  follows  that  Kj  <  K2  and  K3  <  Ki  Imme¬ 
diately  before  the  changeover  to  AD  occurs.  Hence  C  Is  not  used 
Immediately  before  AB. 

It  follows  then  that  there  Is  a  region  below  Cj  -  0  and 
adjoining  this  line,  where  A  is  used.  However,  It  Is  Impossible 
to  vse  another  choice  before  A  In  an  optimal  policy.  When  A  Is 
used  below  Ci ,  we  have 

K[(t)  -  0,  Ki{t)  -  -pCi  >  0,  K3(t)  -  -pCg  >  0.  (2J.2) 

Hence,  Ki  Is  largest  for  all  smaller  t,  and  th*®  A— region  extends 
to  the  x-axls. 

Collecting  the  above  results,  we  have 
Theorem  8.  ^  D  -  iirzr-,  +  'i2r,r4  —  o^riTs  <  ,  the  s  ^1  j  ’  In  to 

the  problem  of  maximizing  f(oo)  never  uses  a  C-pol Icy  and  has  ‘he 
two— choice  form: 


y  C ,  -0 


Plil-  11 
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§24.  ""he  C'l'^e  r-i  ■  r4 . 

.  ome  of  thie  preceding  argum^^hta  fall  In  this  case  because 
Ine  C-pollcy  Keeps  the  slope  y/x  constant.  It  follows  from  (21.  Yd) 
and  (21.  ina’  neither  of  the  mixed  policies  AC  or  DC  Is  ever 
used . 

Let  us  first  of  all  show  tha‘  If  D  <  0,  C  is  never  used.  to 
do  this  we  compare  ‘he  result  of  using  AD  repeatedly  with  ’  ha  ^ 
obtained  from  using  C. 

When  AB  Is  used  continually,  an  easy  calculation  yields 

r,y(00)  ■  ^  (^‘>■'0 


whe  re 


Ti  rg  g  ^  qi  rg-t-qgri 

ri+r?  ’  rir2 


(24.2) 


similarly  'he  result,  of  using  C  con'lnually  Is 

f,(3o)  -  - ^(x^+y^).  (24.^) 

o  '  Ds+Ta  o  o'^  '  '  ' 

the  ineouallty  t'^^Joo)  >  f^(')o)  Is  equivalent  'o  D  <  . 

If  D  >  ,  ttie  above  argument  proves  t:ia  t  no  mixed  policies 
are  pursued.  Different  cases  arise  depending  upon  which  of  the 
lines  r?  e  0,  C-i  -  b  pass  ‘“hrough  *  t,e  positive  quadrant.  As  before. 
It  can  te  e-nabllshed  that  If  C-,  -  0  Is  tae  positive  quadrant,  It 
I-  better  to  use  B  ra'her  than  C  near  the  y— axis.  Le*  us  now 
determine  wtiere  ihe  cl.angeover  from  B  ’o  C  c^n  t'"'‘  made.  Let  t^ 
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be  the  time  of  changeover.  For  <  t  <  oo,  we  have 

K 1  ( t )  ■  pC2  I  ^2(1)  “  —pC  ^3(^)  (24,4) 

Also,  we  must  have  Ki(t^)  ^  K2(t^)  -  K3(t^).  Using  again  the 
remark  that  Ki(oo)  -  Kg (00)  ■  K3(oo),  we  see  that  for  t  ^  we 
must  have  C3  -  0.  'Thus,  3  is  followed  until  the  line  C3  -  0  is 
encountered  and  then  C  Is  followed.  In  this  degenerate  case  C 
plays  the  role  of  BC.  flmllarly,  changeover  from  A  to  C  occurs 
when  Cg  «  0  Is  reach'id.  If  C3  does  not  lie  within  the  positive 
quadrant,  C  Is  used  up  to  the  y-axls.  If  Cg  ■  0  does  not  lie  wlttiln, 
C  Is  used  up  to  the  x-axls. 

§25.  Nonlinear  Utility — ^Iwo-cholce  Problem. 

Let  us  now  consider  briefly  the  two-cnolce  pr'^blem  treated  In 
under  the  condition  that  we  wish  to  maximize  the  expected 
value  of  some  function  of  the  total  return  P. 

In  view  of  the  results  obtained  for  tne  discrete  problem,  It 
Is  somewhat  surprising  to  find  that  for  every  utility  function  u, 
which  Is  strictly  Increasing  and  has  a  continuous  derivative,  the 
optimal  strategy  Is  precisely  trie  sam^  as  th"t  for  the  linear  utility 
problem  solved  above. 

Since  any  monotone-increasing  utility  function  can  be  approxi¬ 
mated  arbitrarily  closely  by  a  function  of  the  above  type,  it  fol¬ 
lows  that  this  policy  Is  optimal  for  any  monotone— Increasing  utility 
function,  although  not  necessarily  unique.  A  function  of  this  class 
of  great  theoretical  and  practical  Importance  Is 
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u(p)  «  foT*  <  R  <  p^. 

-  1  for  R  >  R  . 

o 


The  expected  valu^  of  u(p)  Ip  the  prohatll!*y  that  P  1?  preatpr 

tnan  or  epual  to  P^. 

o 

Let  the  varlabler  have  their  previous  c onnota l 1 onH :  we  oh‘ain 
ar  before 


dx 

dy 

TT 

dp 

TTT 


-^2 (t)r,x(t)  , 

X(0  - 

-i?  ( "0  r?  y  ( t )  , 

y(o)  - 

(25.2) 

-l  (0  Qi  (t  )q  1 (t  jo-lt 

p('")  -  1  ■ 

Let  /.  ( t )  ■  4-  —  x(t)  —  y(t),  tne  quantify  which  repreoents 

the  total  amount  of  ►"'old  mined  up  to  t  if  the  machine  haa  survived 
until  tnen.  'y’he  expected  value  of  u(F)  Ip  p:lven  by  ♦  inTi“(/ral 

op 

C  -  -  J  u{z{-  ))riz,(’  ].  (25.  ") 

o 

■Phl.^-’  Ip  i“aslpst  seen  by  conalderlnjt  that  we  are  paid  for  ’he 
tot.al  nmoun:  of  ftold  tnat  'he  macnlne  has  mined  a"  ‘ne  tlm<^  ’not 
the  machine  is  destroyed. 

Our  aim  Ip  to  find  ’he  functions  'i)i(t),  ( ’  )  subjec  t  ) 


(25. M 


wioicn  X  l.ml G. 
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PurFulne;  the  ?ame  perturbation  techniques  •  r.  above,  we  obtain 
aft^r  some  stra I gh’  forward  calculaLlon 

oo 

^  —  G  ■  £  [Ki(t)  i(t)  4-  K2(t)‘2(tQd‘  -f  o(£),  (25.5) 

o 

whe  re 

K 1  -  q ,  p  ( t  )  u  (  ?  ( t  )  ) 

K?  -  a2p(t)u(z(t)) 

Furt hermore  , 

kJ  (l  )  -  K2  (t )  -  p(t)u»  (z(t) )  Cfli  r2y(t)  -  Qrr,  x(  t  jj  . 

It  follow?  that  If  we  assume  that  u*(z)  >  Wien  z  >  C,  the 
arguments  and  result?  of  the  linear  case  carry  ov^r  wl'h  very  slight 
modi  flea i Ions . 

^25.  General  Remarks. 

An  essential  feature  of  our  Invest  Ira’ Ion  lies  In  viewing  a 
policy  In  It?  extensive  rather  than  normal  form,  to  borr  w  ’ h® 
terminolrgy  of  the  von  Neumann  theory  of  games.  A  no 'her  way  of 
staging  ‘nls  Is  that  Instead  of  d*“lerTrl-'l  o'  » hr  c  mple'^e  solu'lon 
for  one  of  Initial  parameters,  which  would  corrrspond  ‘o  deter¬ 

mining  the  ext,remal  curve;-  In  t  ne  cl  sclcal  tfieory  -iT  trie  calcrlus 


-  S  Ce'  (z(;  )  )r,x(s) 

^  -  n  ,  p  '  ( s  )  u  ( z  (  s, )  )3  d  s 

00 

-  S  [p* (s)u‘ (z(?) )r2y(- ) 

-  oap*  (s)u(z(£)  y]dr. 


(26.6) 
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of  variations,  we  attack  our  problem  by  Imbedding  It  within  the 
family  of  problems  of  this  type  with  arbitrary  Initial  parameters. 

Having  performed  this  Imbedding,  we  seek  to  determine  the 
optimal  continuation  from  each  position.  A  knowledge  of  the  best 
next  move  from  an  arbitrary  position  yields  the  complete  set  of 
moves  from  any  given  position. 

This  Is  the  approach  used  throughout  the  theory  of  dynamic 
programming.  Although  It  may  be  considered  a  variant  In  problems 
of  deterministic  type.  It  Is  In  many  ways  a  necessity  .  pr  blems 
of  stochastic  type. 

It  Is  possible  to  treat  many  of  the  classical  problems  In  the 
calculus  of  variations  by  means  of  this  technique.  We  shall  enlarge 
upon  this  point  In  the  near  future,  cf.  . 

To  Illustrate  these  remarks  let  us  consider  the  result  con¬ 
tained  In  Theorem  Policy  A  Is  to  be  employed  when  (6,3)  holds, 
and  Policy  B  when  the  reverse  Inequality  holds.  Each  term  In  the 
Inequality  has  an  Important  Interpretation.  The  left-hand  side 
represents  the  ratio  of  the  expected  gain  obtained  using  A  to  the 
probability  of  losing  the  machine.  Similarly,  the  right-hand  side 
represents  the  same  ratio  for  3. 

We  see  then  tnat  the  verbal  statement  of  the  solution  Is  that 
at  each  stage  we  maximize  the  ratio  of  expected  gain  to  expected 
loss.  Attractive  as  this  seems  as  a  general  principle  to  describe 
the  solution  of  general  classes  of  problems  of  this  character.  It 
Is  unfortunately,  or  fortunately,  not  correct.  A  counter-example 
cf  Karlin  and  Shapiro  shows  that  In  the  discrete  3— choice  prob¬ 
lem  It  Is  possible  to  determine  the  parameters  In  such  a  way  that 
the  (x ,y )-quadr3nt  Is  divided  Into  four  regions. 
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inside  of  which  the  designated  policy  Is  optimal. 

It  might,  however,  have  t)een  expected  that  In  the  cohtlnuous 
version,  these  difficulties  would  disappear.  The  substance  of 
Lemma  9  Is  that  even  In  the  continuous  case  the  solution  will  not 
be  determined  by  a  simple  criterion  of  the  above  kind.  However, 
as  Theorem  7  states,  there  are  only  three  regions,  as  Indicated 
In  Pig.  10,  if  D  >  0;  and  as  Theorem  8  asserts,  two  regions  If  D  <  0. 

Referring  to  Fig. 10,  we  see  that  one  boundary,  that  determined 
by  -  0,  Is  precisely  the  equality  of  two  ratios,  for  the  B  and 
C  actions.  Furthermore,  It  Is  an  absorbing  boundary.  In  the  sense 
that  a  point  stays  on  It,  once  It  hits  It. 

The  second  boundary,  L  ■  0,  seems  to  be  of  more  complicated 
structure,  and  we  cannot  give  any  simple  Interpretation  of  Its 
equation.  The  reason  for  the  changeover  from  A  to  C  Is  nonlocal. 

In  contrast  to  the  state  of  affairs  at  -  0. 

In  addition,  the  boundary  la  translucent  rather  than  absorbing. 
A  point  which  encouhters  It  passes  through  ahd  continues  across 
the  C-reglon  until  It  strikes  the  line  C-j  - 


0. 
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Flnally,  let  us  emphasize  ‘he  Interesting  result  of  which 

states  that  the  solutl'^n.  In  the  two— choice  p»roMem,  for  a  non¬ 
linear  utility  function  Is  the  same  ts  that  for  the  linear  case. 
This  result  Is  actually  representative  of  a  wide  class  of  similar 
results  for  related  problems,  of  both  ''ne— person  and  two— person 
type.  We  shall  discuss  this  at  another  time. 
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